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invention further relates to polymorphic polynucleotides associated with osteoarthritis. The invention provides methods of determin- 
ing if a particular polymorphism predisposes an individual to or is associated with the development of osteoarthritis. The invention 
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NUCLEOTIDE POLYMORPHISMS ASSOCIATED WITH OSTEOARTHRITIS 



TECHNICAL FIELD 

The invention relates in general to polymorphisms in genes associated with osteoarthritis and 
5 bone remodeling and methods of identifying individuals having a gene containing a polymorphism 
associated with osteoarthritis. The invention also relates to a method of detecting an increases 
susceptibility to a disease in an individual resulting from the presence of a polymorphism or mutation in 
the gene coding sequence of a osteoarthritis and bone remodeling associated gene. 



10 BACKGROUND OF THE INVENTION 

Single nucleotide substitutions and small unique insertions and deletions are the most frequent 
form of DNA polymorphism and disease-causing mutation in the human genome. These DNA 
sequence variations, called single nucleotide polymorphisms (SNPs), have gained popularity and have 
been proposed as the genetic markers of choice for the study of complex genetic traits (Collins et al. 

15 1997 Science 278: 1580- 1581; Riscb and Merkangas 1996 Science 273: 1516-1517). Despite the fact 
that on average approximately one nucleotide position in every 1000 bases along the human 
chromosome is estimated to differ between any two copies of the chromosome (Cooper et al. 1985 
Human Genetics 69: 201-205; Kwok et al. 1996 Genomics 31: 123-126) developing SNP markers is 
not easy. 

20 It has been suggested that association studies (such as linkage equilibrium studies) with a set 

of single nucleotide polymorphism (SNP) markers evenly spaced across the genome at approximately 
100 KB intervals would provide the necessary power to detect the small effects of each gene involved 
in a complex trait (Hauser et al. 1996 Genetic Epidemiology 13: 117-137 in Kwok and Chen 1998 
Genetic Engineering 20: 125-134, Plenum Press, New York). Alternatively, one can take a candidate 

25 gene approach in performing association studies with the use of a set of gene-associated SNP 
markers to detect these genetic factors (ibid.). 

Nucleotide sequence mutations which occur in a gene or gene family, where the gene or gene 
family is associated with a given disease, may be the basis for susceptibility to or development of the 
disease. 

30 Arthritis means "inflammation of a joint" and encompasses more than a hundred diseases. 

They can affect the joints and other connective tissues such as muscles, tendons, ligaments and 
protective coverings of internal organs. The major arthritis diseases are as follows: 

1. osteoarthritis - non-inflammatory degenerative joint disease characterised by splitting and 
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fragmentation of the articular cartilage, hypertrophy of the bone and changes in the synovial 
membrane. 

2. rheumatoid arthritis - chronic systemic, relapsing disease primarily of the joints which is marked 
by inflammatory changes in the synovial membranes and adjacent structures. 
5 3. ankylosing spondylitis - inflammatory disease that affects the joints of the lower back which 

may lead to fusion of the spine 

4. gout - caused by formation of uric acid crystals in the joint, leading to inflammation and severe 
pain. 

Osteoarthritis is the most common type of arthritis. It differs from rheumatoid arthritis in that 
10 it is primarily a degeneration of the joint tissue that may be accompanied by an inflammatory reaction 
(Figure 1). Rheumatoid arthritis is an inflammatory disease first and foremost and inflammation of the 
synovium is the focal point of the disease. 

The initiation and progression of osteoarthritis involves multiple pathogenic mechanisms. An 
imbalance of chondrocyte-controlled anabolic and catabolic processes results in a progressive 
15 degradation of the components of the extracellular matrix of the articular cartilage, associated with 
. secondary inflammatory factors. The primary cause of tins is unknown but possibly involves a .» 

j ■ deficiency of cellular; response to normal tissue demand or insufficient cellular response to i ;„ ; ^ 4 

supernormal demand from mechanical loading or injury. The subsequent repair response could induce 
elevated levels of anabolic molecules, leading to remodelling of the bone and production of osteophytes 
20 (bone outgrowths) characteristic of the disease process. : - ' 

Prevalence and social cost of osteoarthritis. 

With approximately 40 million Americans affected by arthritis and other inflammatory 
25 diseases, the cost to the healthcare system is significant. Of these 40 million people, 21 million have 
osteoarthritis and 2.1 million have rheumatoid arthritis. Osteoarthritis is the most common chronic 
condition and cause of inactivity in patients older than 65. The disease occurs usually at the beginning 
of the fifth decade of life, with increasing prevalence and incidence with advancing age (Table 2). 
The prevalence of arthritis is expected to increase by 57% by the year 2020. In the same time period, 
30 arthritis-causing activity limitation will increase 66% to 11.6 million people (Lawrence et al 1998). 
The primary impact of arthritis in the elderly is decreased physical functioning. This can be due to 
other health-related problems, such as weight gain, cardiovascular disease, GI distress related to 
treatment, increased psychological distress, decreased social functioning, increased work disability, and 
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increased healthcare utilization. The current OA treatment, NSAIDs are responsible for the highest 
number of hospitalisations of any drug category and cause a significant number of internal 
gastrointestinal bleeding in the elderly population. 

The cost of arthritis in the US (including rheumatoid arthritis, osteoarthritis and all other 
rheumatic conditions) was shown to be $64.8 billion in 1992. Of this, direct costs were an estimated 
$15.2 billion and indirect costs $49.6 billion (Yelin and Callahan 1995). A 1997 study showed the 
of care for osteoarthritis as $543 per patient per year (Lanes et al 1997). The largest component was 
hospital care, mostly due to admissions for hip or knee replacement. The cost to the healthcare 
provider is very high due to the prevalence of the illness. 
Unmet medical needs for OA 

Current treatment options for osteoarthritis focus on symptom relief whereas truly disease- 
modifying agents or methods are lacking. Thus, the basic therapy includes common analgesics, 
nonsteroidal anti-inflammatory drugs, physical therapy, walking aids, and eventually in severe cases, 
joint replacement surgery. Perhaps because of the difficulties involved in measuring disease 
progression existing medications do not address the need to prevent further cartilage degradation. 

(-»;■ i -To^develop such drugs the following should be in place: 

f • Compounds; that target appropriate biochemical pathways (e.g. Merk's MMP-3 antagonist) 

Clinical studies must be able to measure disease progression in a cost-effective and safe - 
fashion. This could be either an imaging technique or a biomarker that closely correlates with disease 
progression. 

Disease progression should be detectable within a reasonable time scale (for example, anti- 
inflammatory clinical studies use the WOMAC pain scale for a period of 6 weeks to measure 
improvement due to medication). 

The efficacy of the new drug under development should be observable (using either the 
imaging or biomarker method of assessment) in a sample size comparable to that of other clinical 
trials. 

How can genetics help? Genetic studies have the potential to detect: 
Novel drug targets in the appropriate pathways. 
Individuals with fast progressing osteoarthritis. This would allow a 
pharmaceutical company to prove efficacy in a relatively small sample size and in a reasonable period 
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of time, thus cutting costs. 

Reduce variation from biomarker or imaging patterns. For example, let's 
assume the following response to medication. Although there is a clear patterns of response to 
medication, it is not statistically significant because of the large amount of variation in disease 
5 progression. Lets now assume that there exists a genetic marker that is able to stratify the 

measurement of disease progression in this hypothetical study. The variance of the marker of disease 
progression associated with each genotype is smaller than the overall variance. This can be seen as 
analogous to stratifying a relevant clinical measure in a study (e.g. lipid levels) by gender or by age 
group. By pooling together both genders or both age groups the variance is larger. If we were now to 
10 stratify the results of the previous hypothetical study by genotype we might observe that the 

therapeutic efficacy is now statistically significant. By stratifying according to genotype it could then 
be possible to detect statistically significant efficacy in both groups, while meeting the cost and time 
needs of the entity developing the drug. 

15 Genetic study of osteoarthritis. 

. Evidence for genetic predisposition to OA. .* 

The nature of the genetic influence in osteoarthritis may involve either a structural defect ^that 
s is, collagen), alterations in cartilage or bone metabolism, or a genetic influence on a known risk factor . 

20 for osteoarthritis such as obesity. Twin studies have show that between 39% and 65% of + 

osteoarthritis in the general population can be attributed to genetic factors (MacGregor and Spector, 
1999). Linkage analyses (i.e., common inheritance of affected individuals in the same family) have 
identified a higher risk ratio for relatives of affected individuals compared to the general population. 
The power to detect disease-susceptibility loci through linkage analysis using pairs of affected relatives 

25 depends on 1 K , the risk ratio for type R relatives compared with population prevalence (Risch 1990). 
Kellgren et al. (1963) compared expected and observed incidence of osteoarthritis in first-degree 
relatives of probands with multiple osteoarthritis. Based on their results we have estimated l R for nodal 
and non-nodal osteoarthritis. 



J Type of OA • i M j 


Modal (presence of Heberdeen 's nodes) 


4 5 


Non-nodal ; ; 


4 75 ;. 
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For comparison, concordance for type 2 diabetes ranges between 2-3, and between 4.5 and 
5.5 for rheumatoid arthritis. These figures indicate a high genetic component to OA If, however, non- 
nodal and nodal types of OA are mixed together l R drops to ~ 2.0 Mghlighting the importance of 
careful clinical characterization for genetic studies. 
5 Although it is known that there is a genetic component involved in the etiology of osteoarthritis 

there is also a need in the art for an improved understanding of the genetic causes of osteoarthritis. 

There is also a need in the art for identification of the genes associated with osteoarthritis, and 
identification of sequence variations in these genes that are associated with osteoarthritis and bone 
remodeling. The identification of disease related sequence variations in osteoarthritis and bone 
10 remodeling associated genes will allow for the development of improved methods of screening for 

osteoarthritis. These improved screening protocols may be used to identify individuals at high risk for 
osteoarthritis and in need of preventative treatments. 

The identification of disease related sequence variations in osteoarthritis associated genes may 
facilitate the design of treatment protocols and the identification and design of compounds useful for 
15 treatment of osteoarthritis and bone remodeling. 
t - < • ■ - ■. , • . < 

OBJECTS AND SUMMARY OF THE INVENTION 
. n An object of the present invention is to provide candidate genes associated with osteoarthritis . 
rind bone' Remodeling. : - ...... . , r v 

20 It is another object of the present invention to provide a variant nucleotide in a candidate-gene 

associated with osteoarthritis and bone remodeling. 

Another object of the present invention is to provide methods of detecting variant nucleotides 
in a gene in individuals at risk for osteoarthritis. 

Another object of the present invention is to provide methods of determining if a variant 
25 nucleotide is associated with a predisposition to osteoarthritis. 

Another object of the present invention is to provide candidate genes associated with the 
osteoarthritis and bone remodeling. 

The invention further comprises isolated polynucleotides which contain the single nucleotide 
polymorphisms selected from the Sequence Listing, or its perfect complement. 
30 The invention further comprises an isolated polynucleotide segment of between 10 and 100 

bases of which 10 contiguous bases including a polymorphic site are from a sequence selected from 
the Sequence Listing, or its perfect complement. 

The invention further comprises a probe or target sequence used for genotyping where the 
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probe or target sequence has at least 10 contiguous bases containing a polymorphic site identified and 
from a sequence selected from the Sequence Listing, or its perfect complement. 

The invention further comprises a method for detennining a base occupying a polymorphic site 
in a nucleic acid comprising obtaining the nucleic acid in a sample from an individual or plurality of 
individuals and determining a base occupying a polymorphic site in a sequence selected from the group 
consisting of the Sequence Listing and their perfect complements which occurs in the sample nucleic 



DESCRIPTION OF THE COMPACT DISK-RECORDABLES (CD-R) 

CD-R (Copy l)contains the Sequence Listing formatted in plain ASCII text and Tables 1 and 
2. CD-R (Copy 1) is labeled with Identification No. GX-0022P-1. 

CD-R (Copy 2) is an exact copy of CD-R (Copy 1). CD-R (Copy 2) is labeled with 
Identification No. GX-0022-1 P (Copy 2). 

CD-R (Copy 3) contains the Computer Readable Form of the Sequence Listing in compliance 
with 37 C.F.R. § 1.821(e), and specified by 37 C.F.R. §1.824. CD-R (Copy 3) is labeled with 
Identification No. GX-0022-1 P (Copy 3). 

The material on CD-R 1, 2 and 3 is incorporated by reference into the specification. 

BRIEF DESCRIPTION OF THE TABLES AND DRAWINGS 

These and other features, aspects, and advantages of the present invention will become better 
understood with regard to the following description, appended claims, and accompanying tables 
drawings where: 

Table 1 presents the genomic or cDNA structure of osteoarthritis candidate gene sequences 
and the identity and position of polymorphisms which are the subject of the invention. This table has 

the form wherein: 

a. The DNA change given for an allele is not strand specific; it can 
be on either strand of the DNA molecule. 

b. Single Nucleotide Polymorphisms can be recorded as RIP AC ambiguity 
symbols, as follows: 



acid. 



M 



AorC 



R 



A or G 



S 



Cor G 



K 



GorT 
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W A or T 
Y CorT 

c. Other allele types, such as insertions and deletions, are given in the form: ACA>AA 
or AA>ACA and in such cases the coordinates of the allele include the two invariant 

5 flanking bases. 

d. DNA sequence names are of the form: XX:TmTT[_VV], where XX gives the database of 
origin, as foEows: 

EM EMBL 

FN Incyte FL sequence read 
10 GB GenBank 

IN Incyte proprietary sequence 
LG LifeSeq Gold gene template 

1111)11 gives the sequence ID or accession number for the sequence. In most cases if it is an 
15 accession number it will be followed by _VV where VV is the sequence version in the EMBL or 
GenBank database. 

^ e. The overall structure. of a record in the patent structure is described as follows. Items in 
{braces} indicate a field that is filled in. Items in [square brackets] may or may not be present. 
These entries define a larger virtual sequence d a "link" composed of real database subsequences. 
20 Alleles are annotated onto real sequences, and genomic structure onto the link. 
{Locus ID} 

[Full name : {full name}] 
Link : {link name} 

Subsequence {name} {link start position} {link stop position} {SEQ ID NO} 

25 [...] 

CDS {name} {SEQ ID NO} 

exon/ORF {link start position} {link stop position} 

[...] 
[...] 

30 Allele {seq name} {SEQ ID NO} {seq start} {seq stop} {dna change} 

source {original SNP data source} {SNP id in that source} 
[...] 

consequence {CDS name} {CDS SEQ ID NO} {class} [{peptide pos} {peptide 
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change}] 

[...] 

[...] 

f. Sources. SNPs may have been noted in one of several sources: 

dbSNP Hie NCBI public dbSNP databank 

isSNP In silico SNPs from LifeSeq sequence assembly. 

wetSNP Alleles determined by SSCP. 
Alleles which have a wetSNP entry are experimentally verified. Alleles which are isSNP 
and/or dbSNP only are predictions by computer software of where these SNPs map to, and are *not* 
experimentally verified. 

g. Consequences 

The classes of consequence are as follows: 

Silent The allele does not cause a peptide change 
Missense The allele causes an amino acid substitution 
Frameshift The allele causes a frame shift in the CDS 
Intron The allele lies wholly within an intron 
r .. . 5 ? The allelelies5 ? oftheCDS k-.>.->: ■ ■ ; 

3' The* allele lies 3 ' of the CDS 
, • Unknown The consequence is undefined - for example the allele straddles 

an intron/exon boundary. k ' 

Silent and Missense consequences also supply details of the amino acid position of the change 
and prediction of what the affected amino acid is, and what it is substituted to. There may be multipl 
consequence lines if the locus contains multiple CDS forms. 

h. Sequence and exon positions 

Sequence coordinates are always given on the forward strand of the link. Therefore, if a 
sequence or exon is actually on the reverse strand of the link, its start position will be larger than its 
stop position. 

i. Exon order in CDS definitions 

The exons are given in 5' to 3' order. Consequently, reverse strand CDS start from high 
coordinate numbers downwards, 
j. Link object types 

Loci may have more than one link object, composed of different DNA sequences. Typically there 
might be one genomic and one cDNA link object. 
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Table 2 presents the population frequency of polymorphisms in the candidate genes and 
summarizes various information from Table 2 relating to. the polymorphism. 

Figure 1 illustrates the cDNA structure of the locus and relative positions of identified SNPs 
for megakaryocyte stimulating factor (MSF). 

Figure 2 illustrates the genomic structure of the locus, exons composing multiple CDS, and 
relative positions of identified SNPs for megakaryocyte stimulating factor (MSF). 

The figures show (from left to right) the real sequences making up tire linked genomic 
structure for the locus, a scale in link coordinates (negative numbers would indicate a view of the 
reverse strand), one or more CDSs representing the positions of exons, horizontal bars representing 
the positions of identified SNPs (alleles) from the various sources, and shaded boxes showing regions 
targeted for screening by SSCP. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Before the present compositions and methods are described, it is understood that embodiments 
of the invention are not limited to the particular machines, instruments, materials, and methods 
described, as these may vary. It is also to be understood that the terminology used herein is for the 
purpose of describing particular embodiments only, and is not intended to limit the scope of the 
invention. 

As used herein and in the appended claims, the singular forms "a," "an," and "the" include 
plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a 
nucleic acid probe" includes a plurality of such nucleic acid probes, and a reference to "a gene" is a 
reference to one or more genes and equivalents thereof known to those skilled in the art, and so forth. 

Unless defined otherwise, all technical and scientific terms used herein have the same 
meanings as commonly understood by one of ordinary skill in the art to which this invention belongs. 
Although any machines, materials, and methods similar or equivalent to those described herein can be 
used to practice or test the present invention, the preferred machines, materials and methods are now 
described. All publications mentioned herein are cited for the purpose of describing and disclosing the 
cell lines, protocols, reagents and vectors which are reported in the publications and which might be 
used in connection with various embodiments of the invention. Nothing herein is to be construed as an 
admission that the invention is not entitled to antedate such disclosure by virtue of prior invention. 
Definitions 
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As used herein, "polymorphism" refers to a nucleotide alteration that either predisposes an 
individual to a disease or is not associated with a disease, which occurs as a result of a substitution, 
insertion or deletion. 

More particularly, a "polymorphism" or "polymorphic variation" maybe a nucleic acid 
sequence variation, as compared to the naturally occurring sequence, resulting from either a nucleotide 
deletion, an insertion or addition, or a substitution, which is present at a frequency of greater than 1% 
in a population. 

As used herein, "neutral polymorphism" refers to a polymorphism which is present at a 
frequency of greater than 1% in a population, which does not alter gene function or phenotype, and 
thus is not associated with a predisposition to or development of a disease. 

As used herein "polynucleotide sequence" refers to a sense or antisense nucleic acid 
sequence comprising RNA, cDNA, genomic DNA, synthetic forms and mixed polymers, that maybe 
chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases. 

As used herein "mutation" refers to a variation in the nucleotide sequence of a gene or 
regulatory sequence as compared to the naturally occurring or normal nucleotide sequence. A 
mutation may result from the deletion, insertion or. substitution of more than one nucleotide (e.g., 2, 3, 
4, or more nucleotides) or a single nucleotide change such as a deletion, insertion or substitution. The 
term "mutation" also encompasses chromosomal rearrangements. 

As used herein, "nucleic acid probe" refers to an? oligonucleotide^ nucleotide or polynucleotide, 
and fragments and portions thereof, and to DNA or RNA of genomic or synthetic origin which may be 
single- or double- stranded, which represents the sense or antisense strand. Both terms "nucleic acid 
probe" and "DNA fragment" refer to a length of polynucleotide, for example, as small as 5 
nucleotides, 10, 20, 25, 40, 50, 75, 100, 250, 400, 500 and 1 kb, and as large as 5-10kb. 

As used herein, "alteration" refers to a change in either a nucleotide or amino acid sequence, 
as compared to the naturally occurring sequence, resulting from a deletion, an insertion or addition, or 
a substitution. 

As used herein, "deletion" refers to a change in either nucleotide or amino acid sequence 
wherein one or more nucleotides or amino acid residues, respectively, are absent. 

As used herein, "insertion" or "addition" refers to a change in either nucleotide or amino acid 
sequence wherein one or more nucleotides or amino acid residues, respectively, have been added. 

As used herein, "substitution" refers to a replacement of one or more nucleotides or amino 
acids by different nucleotides or amino acid residues, respectively. 

As used herein, "specifically hybridizable" refers to a nucleic acid or fragment thereof that 
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hybridizes to another nucleic acid (or a complementary strand thereof) due to the presence of a region 
that is at least approximately 90% homologous, preferably at least approximately 90-95% homologous, 
and more preferably approximately 98-100% homologous, as are polynucleotides that hybridize to a 
partner under stringent hybridization conditions. "Stringent" hybridization conditions are defined 
hereinbelow for various hybridization protocols. A probe that is specifically hybridizable to a given 
sequence can be used to detect a 1 bp out of 10 bp (10%) or a 1 bp out of 20 bp (5 %) difference 
between nu deic acid sequences and is therefore useful for discriminating between a wild type and a 
mutant form of a gene of interest. 

As used herein, "amino acid sequence" refers to the sequential array of amino acids that have 
been joined by peptide bonds between the carboxylic acid group of one amino acid and the amino 
group of the adjacent amino acid to form long linear polymers comprising proteins. 

As used herein, "amino acid" refers to protein subunit molecules that contain a carboxylic acid 

group, and an amino group, both linked to a single carbon atom. 

A polypeptide is said to be "encoded" by a polynucleotide if the polynucleotide, either in its 

native state or in a recombinant form can be transcribed and/or translated to produce the mRNA for 

and/or the polypeptide or a fragment thereof. 

. As used herein, "gene "refers to a region of UNA which includes a portion which canbe , 

transcribed into RNA, and which may contain an open reading frame, or coding region (also referred 

to as anexon) which encodes a protein, a non-coding region (also referred to as an intron), and a i 

specific regulatory region comprising the DNA regulatory elements which control expression of the 

transcribed region. 

As used herein, "coding region" refers to a region of DNA which encodes a protein, also 
known as an exon. 

As used herein, "non-coding region" refers to a region of DNA which does not encode a 
protein coding region, also known as an intron, and is not included in the RNA molecule that is 
synthesized from a particular gene. 

As used herein, "regulatory region" refers to DNA sequences which are located either 5' of 
the transcription start site, 3' or the transcription termination site, within an intron or exon, capable of 
ensuring that the gene is transcribed at the proper time and in the appropriate cell type. 

As used herein, "consensus DNA sequence" or "wild-type DNA sequence" refers to a 
sequence wherein every position represents the nucleotide that occurs with the highest frequency 
when many actual sequences are compared. As used herein, "consensus DNA sequence" or "wild- 
type DNA sequence" also refers to the normal, naturally occurring DNA sequence. 
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As used herein, a given sequence (or mutation or polymorphism) "associated with" 
osteoarthritis refers to a nucleic acid sequence that increases susceptibility to the disease, predisposes 
an individual to the disease or contributes to the disease, wherein the nucleic acid sequence is present 
at a higher frequency (at least 5%, preferably 10%, more preferably 25 % higher) in individuals with 
the disease as compared to individuals who do not have the disease. 

As used herein, a sequence "not associated with" osteoarthritis refers to a nucleic acid 
sequence that does not increase susceptibility to the disease, predispose an individual to the disease or 
contribute to the disease, wherein the nucleic acid sequence is not present at a higher frequency in 
individuals with the disease, and thus is present at a frequency about equal to its frequency in 
individuals who do not have the disease. 

As used herein, "amplifying" refers to producing additional copies of a nucleic acid sequence, 
preferably by the method of polymerase chain reaction (Mullis and Faloona, 1987, Methods Enzymol, 
155: 335). 

As used herein, "oligonucleotide primers" refer to single stranded DNA or RNA molecules 
that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid 
strand. Oligonucleotide primers useful according to the invention are between 5 to 100 nucleotides in, 
length, preferably 20-60 nucleotides in length, and more preferably *20-40 nucleotides in length. 

As used herein, "sequencing" refers to determining the precise nucleotide composition or 
sequence of a nucleic acid region by methods well known in the art (see Ausubel et al. , supra and 
Sambrook et al., supra). 

As used herein, "comparing" a sequence refers to determining if the nucleotides at one or 
more positions in a particular region of a nucleic acid fragment are identical for any two or more 
sequences. According to the invention, sequence comparisons can be performed by using computer 
program analysis as described below in Section F entitled "Identification and Characterization of 
Polymorphisms' ' . 

As used herein, "sequence differences" or "sequence variations" refer to nucleotide changes, 
at one or more positions between any two or more sequences being compared. 

As used herein, "determining the presence of polymorphic variations" refers to using methods 
well known in the art to identify a nucleotide, at one or more positions within a particular nucleic acid 
region, that is distinct from the nucleotide present in the naturally occurring, wild-type or consensus 
sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution. 

As used herein, "determining the absence of polymorphic variations" refers to using methods 
well known in the art to determine that the nucleotides present at every position analyzed in a 
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particular nucleic acid region are identical to the nucleotides present in the naturally occurring, wild- 
type or consensus sequence. 

As used herein, "genotyping" refers to determining the composition of the genetic material 
that is inherited by an organism from its parents. 

As used herein, "biological sample" refers to a tissue or fluid sample containing a 
polynucleotide or polypeptide of interest, and isolated from an individual including but not limited to 
plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, 
intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell 
culture constituents. 

As used herein, "amplimers" refer to a specific fragment of DNA generated by PCR that is 
at least 30 bp in length and is preferably between 50 and lOObp in length, and is more preferably 
between 150-300bp in length, with a melting temperature in the range of approximately 60-62°C. 

As used herein, "phenotype" refers to the biological appearances of an organism or a tissue 
derived from an organism, wherein biological appearances include chemical, structural and behavioral 
attributes, and excludes genetic constitution. 

As used herein, "genotype" refers to the genetic material that is inherited by an organism from 

its parents. . *. 

As used herein, "genetic susceptibility to osteoarthritis" refers to an increased risk of 
developing osteoarthritis resulting from specific DNA differences relative to non-susceptible <y > 

individuals. Preferably an individual who is genetically susceptible to osteoarthritis has a 5-100%, and 
more preferably a 25-50% greater chance of developing osteoarthritis, as compared to non- 
susceptible individuals. 

As used herein, "diagnostic" refers to the practice of identifying a disease from the signs and 
symptoms of an individual including the DNA sequences of genes that are associated with an 
increased susceptibility to the disease. "Diagnostic" also refers to the practice of stratifying patient 
populations based on the efficacy or toxicity of a composition, and the predictive placement of an 
individual in a response strata based on stata-associated parameters. 

As used herein, "prognosis" refers to the possibility of recovering from a particular disease or 
condition, and also refers to risk assessment of developing a particular disease or condition. 

THE INVENTION 

Various embodiments of the invention include polynucleotides and polymorphic polynucleotides 
associated with a given human disease, for example, with osteoarthritis. The invention also provides a 
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gene sequence containing one or more polymorphic nucleotides associated with a predisposition to or 
the development of a given human disease such as osteoarthritis. The invention also relates to 
polypeptides encoded by the polynucleotides or the polymorphism-containing gene. The invention also 
provides methods of detecting a polymorphism according to the invention in individuals at risk for 

5 osteoarthritis, and for determining if a given polymorphism is associated with a predisposition to the 
disease. The invention also discloses polymorphism^) that are either associated with or are not 
associated with (i.e., are neutral) osteoarthritis. A polymorphism in a given gene can be utilized in 
various diagnostic and therapeutic methods and procedures, for example, in nucleic acid and peptide 
diagnosis, drug screening and design, and in gene and peptide therapy. A polymorphism associated 

10 with a given gene can be utilized in various gene expression systems and assays designed to analyze 
gene regulation and expression. 

A. Design and Synthesis of Oligonucleotide Primers 

According to the present invention, oligonucleotide primers are disclosed that are useful for 

15 determining the sequence of a particular allele of a gene. The invention also discloses oligonucleotide 
primers designed to amplify a region of a gene that is known to contain a polymorphism. -The invention 
also discloses oligonucleotide primers designed to anneal specifically to a particular allele of a gene. > 

Oligonucleotide primers useful according to the invention are single-stranded DNA or RNA 
molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second : 

20 nucleic acid strand. The primer is complementary to a portion of a target molecule present in a pool of 
nucleic acid molecules. It is contemplated that oligonucleotide primers according to the invention are 
prepared by synthetic methods, either chemical or enzymatic. Alternatively, such a molecule or a 
fragment thereof is naturally-occurring, and is isolated from its natural source or purchased from a 
commercial supplier. Oligonucleotide primers are 5 to 100 nucleotides in length, ideally from 20 to 40 

25 nucleotides, although oligonucleotides of different length are of use. 

Pairs of single-stranded DNA primers can be annealed to sequences within or surrounding a 
gene on chromosome Y in order to prime amplifying DNA synthesis of a region of a gene. A 
complete set of gene primers will allow synthesis of all of the nucleotides of the coding sequences, 
e.g., the exons, introns and control regions. Preferably, the set of primers will also allow synthesis of 

30 both intron and exon sequences. 

Allele-specific primers are also useful, according to the invention. Such primers will anneal 
only to a particular-mutant allele (e.g. alleles containing a polymorphism), and thus will only amplify a 
product if the template also contains the polymorphism. Allele specific primers that anneal only to a 
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wild type gene sequence are also useful according to the invention. 

Typically, selective hybridization occurs when two nucleic acid sequences are substantially 
complementary (at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, 
preferably at least about 75% ? more preferably at least about 90% complementary). See Kanehisa, 

5 M., 1984, Nucleic Acids Res . 12: 203, incorporated herein by reference. As a result, it is expected that 
a certain degree of mismatch at the priming site is tolerated. Such mismatch may be small, such as a 
mono-, di- or tri-nucleotide. Alternatively, it may encompass loops, which are defined as regions in 
which there exists a mismatch in an uninterrupted series of four or more nucleotides. 

Numerous factors influence the efficiency and selectivity of hybridization of the primer to a 

10 second nucleic acid molecule. These factors, which include primer length, nucleotide sequence and/or 
composition, hybridization temperature, buffer composition and potential for steric hindrance in the 
region to which the primer is required to hybridize, will be considered when designing oligonucleotide 
primers according to the invention. 

A positive correlation exists between primer length and both the efficiency and accuracy with 

15 which a primer will anneal to a target sequence. In particular, longer sequences have a higher melting 
temperature (T M ) than do shorter ones, and are less likely to be repeated within a given target 
. sequence, thereby minimizing promiscuous hybridization. Primer sequences with a high G-C content or 
that comprise palindromic sequences tend to self-hybridize, as do their intended target sites, since 
unimolecular, rather than bimolecular, hybridization kinetics are generally favored in solution.. \. 

20 However, it is also important to design a primer that contains sufficient numbers of G-C nucleotide 
pairings since each G-C pair is bound by three hydrogen bonds, rather than the two that are found 
when A and T bases pair to bind the target sequence, and therefore forms a tighter, stronger bond. 
Hybridization temperature varies inversely with primer annealing efficiency, as does the concentration 
of organic solvents, e.g. formamide, that might be included in a priming reaction or hybridization 

25 mixture, while increases in salt concentration facilitate binding. Under stringent annealing conditions, 
longer hybridization probes (of use, for example, in Northern analysis), or synthesis primers, hybridize 
more efficiently than do shorter ones, which are sufficient under more permissive conditions. Stringent 
hybridization conditions typically include salt concentrations of less than about 1M, more usually less 
than about 500 mM and preferably less than about 200 mM. Hybridization temperatures range from as 

30 low as 0°C to greater than 22°C, greater than about 30°C, and (most often) in excess of about 37°C. 
Longer fragments may require higher hybridization temperatures for specific hybridization. As several 
factors affect the stringency of hybridization, the combination of parameters is more important than 
the absolute measure of a single factor. 
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Oligonucleotide primers can be designed with these considerations in mind and synthesized 
according to the following methods. 

1 . Oligonucleotide Primer Design Strategy 

The design of a particular oligonucleotide primer for the purpose of sequencing or PCR 
involves selecting a sequence that is capable of recognizing the target sequence, but has a minimal 
predicted secondary structure. The oligonucleotide sequence binds only to a single site in the target 
nucleic acid. Furthermore, the Tm of the oligonucleotide is optimized by analysis of the length and GC 
content of the oligonucleotide. Furthermore, when designing a PCR primer useful for the amplification 
of genomic DNA, the selected primer sequence does not demonstrate significant matches to 
sequences in the GenBank database (or other available databases). 

The design of a primer is facilitated by the use of readily available computer programs, 
developed to assist in the evaluation of the several parameters described above and the optimization of 
primer sequences. Examples of such programs are "PrimerS elect" of the DNAStar™ software 
package (DNAStar, Inc.; Madison, WI), OLIGO 4.0 (National Biosciences, Inc.), PRIMER, 
Oligonucleotide Selection Program, PGEN.and Amplify (described in Ausubel et al, 1995, Short 
Protocols in Molecular Biology , 3rd Edition, John Wiley & Sons). Primers are designed with 
sequences that serve as targets for other primers to produce a PCR product that has known 
. sequences on the ends which serve as targets for further amplification (e.g. to sequence the PCR < 
product). If many different genes are amplified with specific primers that share a common 'tail' 
sequence', the PCR products from these distinct genes can subsequently be sequenced with a single 
set of primers. Alternatively, in order to facilitate subsequent cloning of amplified sequences, primers 
are designed with restriction enzyme site sequences appended to their 5 y ends. Thus, all nucleotides of 
the primers are derived from gene sequences or sequences adjacent to a gene, except for the few 
nucleotides necessary to form a restriction enzyme site. Such enzymes and sites are well known in the 
art. If the genomic sequence of a gene and the sequence of the open reading frame of a gene are 
known, design of particular primers is well within the skill of the art. 

2. Synthesis 

The primers themselves are synthesized using techniques which are also well known in the 
art. Once designed, oligonucleotides are prepared by a suitable method, e.g. the phosphoramidite 
method described by Beaucage and Carruthers (1981, Tetrahedron Lett. , 22:1859) or the triester 
method according to Matteucci et al. (1981, J. Am. Chem. Soc , 103:3185), both incorporated herein 
by reference, or by other chemical methods using either a commercial automated oligonucleotide 
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synthesizer (which is commercially available) or VLSIPS™ technology. 

B. Production of a Polynucleotide Sequence 

The invention discloses polynucleotide sequences comprising polymorphisms. The 

5 polynucleotide sequences of the invention are specifically hybridizable to a mutant form of a gene and 
are therefore useful for discriminating between a wild-type form of a gene and a mutant form of a 
gene. The polynucleotide sequences of the invention may also be useful for expression of the encoded 
protein or a fragment thereof. The invention also features antisense polynucleotide sequences 
complementary to polynucleotide sequences comprising polymorphisms. Antisense polynucleotide 

10 sequences are useful according to the invention for inhibiting expression of an allelic form of a gene. 

The present invention utilizes polynucleotide sequences and fragments comprising RNA, 
cDNA, genomic DNA, synthetic forms, and mixed polymers. The invention includes both sense and 
antisense strands of the polynucleotide sequences. According to the invention, the polynucleotide 
sequences maybe chemically or biochemically modified or may contain non-natural or derivatized 

15 nucleotide bases. Such modifications include, for example, labels, mefhylation, substitution of one or 
more of the naturally occurring nucleotides with an analog, internucleotide modifications such as 
uncharged linkages (e.g. methyl phosphonates,' phosphorodithioates. etc.), pendent moieties (e.g., 
polypeptides), intercalated, (e.g. acridine, psoralen, etc.) chelators, alkylators, and modified linkages 
(e.g. alpha anomeric nucleic acids, etc.) Also included are synthetic molecules that mimic 

20 polynucleotides in their ability to bind to a designated sequence via hydrogen bonding and other 

chemical interactions. Such molecules are known in the art and include, for example, those in which 
peptide linkages substitute for phosphate linkages in the backbone of the molecule. 

The polynucleotide may be a naturally occurring polynucleotide, or may be a structurally 
related variant of such a polynucleotide having modified bases and/or sugars and/or linkages. The 

25 term "polynucleotide" as used herein is intended to cover all such variants. 

Modifications, which may be made to the polynucleotide may include (but are not limited to) 
the following types: 
a) Backbone modifications 

i) phosphorothioates (X or Y or W or Z = S or any combination of two or more with the 
30 remainder as 0). 

e.g. Y=S (Stein et aL, 1988, Nucleic Acids Res. , 15:3209), X=S (Cosstick and Vyle, 1989, 
Tetrahedron Letters , 30:4693), Y and Z=S (Brill etal., 1989, J. Amer. Chem. Soc 111:2321) 

ii) methylphosphonates (eg Z=methyl (Miller et aL, 1980, J. Biol. Chem. , 255:9569)) 
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iii) phosphoramidates (Z = N-(alkyl) 2 e.g. alkyl methyl, ethyl butyl) (Z=morpholine or 
piperazine) (Agrawal et aL, 1988, Proc. Natl Acad. Sci.. USA , 85;7079) (X or W = NH) (Mag and 
Engels. 1988. Nucleic Acids Res. , 16:3525) 

iv) phosphotriesters (Z=0-alkyl e.g. methyl, ethyl etc) (Miller et al., Biochemistry, 21:5468) 
5 v) phosphorus-free linkages (e.g. carbamate, acetamidate, acetate) (Gait et aL, 1974, J 

Chem.Soc. PerkinI , 1684, Gait et al., 1979, J Chem.Soc. PerkinI , 1389) 

b) Sugar modifications 

i) 2'-deoxynucleosides (R=H) 

ii) 2 ? -0-methylated nucleosides (R=OMe) (Sproat et al., 1989, Nucleic Acids Res. , 17: 

10 3373) 

iii) 2 * -fluoro-2 ' -deoxynucleosides (R=F) (Krug et al., 1989, Nucleosides and Nucleotides , 

8:1473) 

c) Base modifications - (for a review see Jones, 1979, Int. J. Biolog. Macromolecules , 1:194) 

i) pyrimidine derivatives substituted in the 5-position (e.g. methyl, bromo, fluoro etc) or 
15 replacing a carbonyl group by an amino group (Piccirilli et al., 1990, Nature , 343:33). 

ii) purine derivatives lacking specific nitrogen atoms (e.g. 7-deaza adenine, hypoxanthine) or 
functionalized in the 8-position (e.g. 8-azido adenine, 8-bromo adenine) 

d) Polynucleotides covalently linked to reactive flunctional groups, e.g.: 

i) psoralens (Miller et al., 1988, Nucleic Acids Res. Special Pub . No. 20:113, phenanthrolines 
20 (Sun et al.,- 1988, Biochemistry , 27:6039), mustards (Vlassov et al., 1988, Gene , 72:313) (irreversible 

cross-linking agents with or without the need for co-reagents) 

ii) acridine (intercalating agents) (Helene et al., 1985, Biochimie , 67:777) 

iii) thiol derivatives (reversible disulphide formation with proteins) (Connolly and Newman, 
1989. Nucleic Acids Res. , 17:4957) 

25 iv) aldehydes (Schiffs base formation) 

v) azido, bromo groups (UV cross-linking) 

vi) ellipticines (photolytic cross-linking) (Perrouault et al., 1990, Nature , 344:358) 

e) Polynucleotides covalently linked to lipophilic groups or other reagents capable of improving uptake 
by cells, e.g.: 

30 i) cholesterol (Letsinger et al., 1989, Proc. Natl. Acad. Sci. USA , 86:6553), polyamines 

(Lemaitre et al., 1987, Proc. Natl. Acad. Sci. USA , 84: 648), other soluble polymers (e.g. polyethylene 
glycol) 

f) Polynucleotides cont ainin g alpha-nucleosides (Morvan et al., Nucleic Acids Res ., 15: 3421) 



18 



WO 03/054166 



PCT/US02/41225 



g) Combinations of modifications aVf ) 

It should be noted that such modified polynucleotides, while sharing features with 
polynucleotides designed as "anti-sense" inhibitors, are distinct in that the compounds correspond to 
sense-strand sequences and the mechanism of action depends on protein-nucleic acid interactions and 
5 does not depend upon interactions with nucleic acid sequences. 

1. Polynucleotide Sequences Comprising DNA 
a. Cloning 

Polynucleotide sequences comprising DNA can be isolated from cDNA or genomic libraries 
10 (including YAC and BAG libraries) by cloning methods well known to those sMlled in the art (Ausubel 
et aL, supra). Briefly, isolation of a DNA clone comprising a particular polynucleotide sequence 
involves screening a recombinant DNA or cDNA library and identifying the clone containing the 
desired sequence. Cloning will involve the following steps. The clones of a particular library are spread 
onto plates, transferred to an appropriate substrate for screening, denatured, and probed for the 
15 presence of a particular sequence. A description of hybridization conditions, and methods for 
producing labeled probes is included below. 

The desired clone is preferably identified by hybridization to a nucleic acid probe or by 
expression of a protein that can be detected by an antibody. Alternatively, the desired clone is 
identified by polymerase chain amplification of a sequence defined by a particular set of primers 
20 according to the methods described below. 

The selection of an appropriate library involves identifying tissues or cell lines that are an 
abundant source of the desired sequence. Furthermore, if the polynucleotide sequence of interest 
contains regulatory sequence or intronic sequence a genomic library is screened (Ausubel et aL, 
supra). 

25 b. Genomic DNA 

Polynucleotide sequences of the invention are amplified from genomic DNA. Genomic DNA 
is isolated from tissues or cells according to the following method. 

To facilitate detection of a variant form of a gene from a particular tissue, the tissue is isolated 
free from surrounding normal tissues. To isolate genomic DNA from mammalian tissue, the tissue is 
30 minced and frozen in liquid nitrogen. Frozen tissue is ground into a fine powder with a prechilled 

mortar and pestle, and suspended in digestion buffer (100 mM NaCl, 10 mM TrisCl, pH 8.0, 25 mM 
EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K) at 1.2ml digestion buffer per lOOmg of 
tissue. To isolate genomic DNA from mammalian tissue culture cells, cells are pelleted by 
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centrifugation for 5 min at 500 x g, resuspended in 1-10 ml ice-cold PBS, repelleted for 5 min at 500 x 
g and resuspended in 1 volume of digestion buffer. 

Samples in digestion buffer are incubated (with shaking) for 12-18 hours at 50°C, and then 
extracted with an equal volume of phenol/cWoroform/isoamyl alcohol If the phases are not resolved 
following a centrifugation step (10 min at 1700 x g), another volume of digestion buffer (without 
proteinase K) is added and the centrifugation step is repeated. If a thick white material is evident at 
the interface of the t^o phases, the organic extraction step is repeated. Following extraction the upper, 
aqueous layer is transferred to a new tube to which will be added 1/2 volume of 7.5M ammomum 
acetate and 2 volumes of 100% ethanol. The nucleic acid is pelleted by centrifugation for 2 min at 
1700 x g, washed with 70% ethanol, air dried and resuspended in TE buffer (10 mM TrisCl, pH 8.0, 1 
mM EDTA, pH 8.0) at lmg/ml. Residual RNA is removed by incubating the sample for 1 hour at 37°C 
in the presence of 0. 1% SDS and 1 mg/ml DNAse-free RNASE, and repeating the extraction and 
ethanol precipitation steps. The yield of genomic DNA, according to this method is expected to be 
approximately 2 mg DNA/1 g cells or tissue (Ausubel et al., supra). Genomic DNA isolated 
according to this method can be used for Southern blot analysis, restriction enzyme digestion, dot blot 
analysis or PGR analysis, according to the invention. 

c. Restriction digest (of cDNA or genomic DNA) 

Following the identification of a desired cDNA or genomic clone containing a particular 
sequence, polynucleotides of the invention are isolated from these clones by digestion with restriction 
enzymes. 

The technique of restriction enzyme digestion is well known to those skilled in the art (Ausubel 
et al, supra). Reagents useful for restriction enzyme digestion are readily available from commercial 
vendors including New England Biolabs, Boebringer Mannheim, Promega, as well as other sources. 

d. PCR 

Polynucleotide sequences of the invention are amplified from genomic DNA or other natural 
sources by the polymerase chain reaction (PCR). PCR methods are well-known to those skilled in the 
art. 

PCR provides a method for rapidly amplifying a particular DNA sequence by using multiple 
cycles of DNA replication catalyzed by a thermostable, DNA-dependent DNA polymerase to amplify 
the target sequence of interest. PCR requires the presence of a nucleic acid to be amplified, two 
single stranded oligonucleotide primers flanking the sequence to be amplified, a DNA polymerase, 
deoxyribonucleoside triphosphates, a buffer and salts. 

The method of PCR is well known in the art. PCR, is performed as described in Mullis and 
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Faloona, 1987., Methods Enzvmol. , 155: 335, herein incorporated by reference. 

PCR is performed using template DNA (at least 1 fg; more usefully, 1 - 1000 ng) and at least 
25 pmol of oligonucleotide primers. A typical reaction mixture includes: 2 ml of DNA, 25 pmol of 
oligonucleotide primer, 2.5 ml of lOx PCR buffer 1 (Perkin-Elmer, Foster City, CA), 0.4 ml of 1.25 

5 mM dNTP, 0. 15 ml (or 2.5 units) of Taq DNA polymerase (Perkin Elmer, Foster City, CA) and 

deionized water to a total volume of 25 ml. Mineral oil is overlaid and the PCR is performed using a 
programmable thermal cycler. 

The length and temperature of each step of a PCR cycle, as well as the number of cycles, are 
adjusted according to the stringency requirements in effect. Annealing temperature and timing are 

10 determined both by the efficiency with which a primer is expected to anneal to a template and the 
degree of mismatch that is to be tolerated. The ability to optimize the stringency of primer annealing 
conditions is well within the knowledge of one of moderate skill in the art. An annealing temperature 
of between 30°C and 72°C is used. Initial denaturation of the template molecules normally occurs at 
between 92°C and 99°C for 4 minutes, followed by 20-40 cycles consisting of denaturation (94-99°C 

15 for 15 seconds to 1 minute), annealing (temperature determined as discussed above; 1-2 minutes), and 
extension (72°C for 1 minute). The final extension step is generally carried out for 4 minutes at 72°C, 
and may be followed by an indefinite (0-24 hour) step at 4°C. • 

Several techniques for detecting PCR products quantitatively without electrophoresis may be 
useful according to the invention in order to make it more suitable for easy clinical use. One of these 

20 techniques, for which there are commercially available kits such as Taqman™ (Perkin Elmer, Foster 
City, CA), is performed with a transcript-specific antisense probe. This probe is specific for the PCR 
product (e.g. a nucleic acid fragment derived from a gene) and is prepared with a quencher and 
fluorescent reporter probe complexed to the 5' end of the oligonucleotide. Different fluorescent 
markers can be attached to different reporters, allowing for measurement of two products in one 

25 reaction. When Taq DNA polymerase is activated, it cleaves off the fluorescent reporters of the 
probe bound to the template by virtue of its 5'-to-3 ' nucleolytic activity. In the absence of the 
quenchers, the reporters now fluoresce. The color change in the reporters is proportional to the 
amount of each specific product and is measured by a fluorometer; therefore, the amount of each 
color can be measured and the PCR product can be quantified. The PCR reactions can be performed 

30 in 96 well plates so that samples derived from many individuals can be processed and measured 
simultaneously. The Taqman™ system has the additional advantage of not requiring gel 
electrophoresis and allows for quantification when used with a standard curve. 
2. Polynucleotide Sequences Comprising RNA 

21 
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The present invention also provides a polynucleotide sequence comprising RNA. A 
polynucleotide comprising RNA is useful for detecting snps and polymorphisms by techniques 
including but not limited to hybridization methods or the RNase protection method. A polynucleotide 
comprising RNA is also useful as a template for the in vitro production of protein. A polynucleotide 
5 comprising RNA is also useful for detecting and localizing specific mRNA sequences by in situ 
hybridization. 

Polynucleotide sequences comprising RNA can be produced according to the method of in 
vitro transcription. 

The technique of in vitro transcription is well known to those of skill in the art. Briefly, the 
10 gene of interest is inserted into a vector containing an SP6, T3 or T7 promoter. The vector is linearized 
with an appropriate restriction enzyme that digests the vector at a single site located downstream of 
the coding sequence. Following a phenol/chloroform extraction, the DNA is ethanol precipitated, 
washed in 70% ethanol, dried and resuspended in sterile water. The in vitro transcription reaction is 
performed by incubating the linearized DNA with transcription buffer (200 mM TrisCl, pH 8.0,40 mM 
15 MgCl 2 , 10 mM spermidine, 250 NaCl [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgC^, lOmM 
spermidine [SP6]), dithiothreitol, RNASE inhibitors, each of the four ribonucleoside triphosphates, and 
either SP6, T7 or T3 RNA polymerase for 30 min at 37°C. To prepare a radiolabeled polynucleotide V 
comprising RNA, unlabeled UTP will be omitted and -SUTP will be included in the reaction mixture. 
The DNA template is then removed by incubation with DNasel. Following ethanol precipitation, an 
20 aliquot of the radiolabeled RNA is counted in a scintillation counter to determine the cpm/ml (Ausubel 

et al., supra). ^ 

Alternatively, polynucleotide sequences comprising RNA are prepared by chemical synthesis 
techniques such as solid phase phosphoramidite (described above). 

3. Polynucleotide Sequences Comprising Oligonucleotides 

25 A polynucleotide sequence comprising oligonucleotides can be made by using oligonucleotide 

synthesizing machines which are commercially available (described above). 

4. Polynucleotide Sequences Encoding Fusion Proteins 
Polynucleotide sequences of the invention can be used to express the protein product (or 

fragment thereof) of the gene of interest by inserting the polynucleotide sequence into an expression 
30 vector. Expression vectors suitable for protein expression in mammalian cells, bacterial cells, insect 
cells or plant cells are well known in the art and are described in Section H entitled "Production of a 
Mutant Protein". 

Polynucleotide sequences of the invention can be used to prepare hybrid polynucleotides 
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comprising a sequence of a gene adjacent to a sequence encoding a foreign protein or a fragment 
thereof (e.g lacZ, trpE, glutathionine S-transferase or thioredoxin) or a protein tag (hemmaglutinin or 
FLAG). Such hybrid polynucleotides produce fusion proteins that are useful, according to the 
invention, for improved expression and/or rapid isolation of a protein or protein fragment, encoded by 
the sequence of a gene. Hybrid polynucleotides are also useful as a source of antigen for the 
production of antibodies. 

Nucleic acid constructs comprising a polynucleotide of genomic, cDNA, synthetic or semi- 
synthetic origin in association with a polynucleotide sequence encoding a foreign protein or a fragment 
thereof, (carrier sequence) can be generated by recombinant nucleic acid techniques well known in 
the art (See Ausubel et al., supra). According to this method, the cloned gene is introduced into an 
expression vector at a position located 3' to a carrier sequence coding for the amino terminus of a 
highly expressed protein, an entire functional moiety of a highly expressed protein or the entire protein. 
It is preferable to use a earner sequence from an E. coli gene or from any gene that is expressed at 
high levels in E. colt It is often preferable to select a carrier sequence that will facilitate protein 
purification, either with antibodies, or with an affinity purification protocol that is specific for the 
carrier protein being used. For example, the purification protocol can be designed in accordance with 
the unique physical properties of the carrier protein (e.g. heat stability). Alternatively, the tag sequence* 
may encode a protein (e.g. glutathione-S -transferase (GST)) which can be purified by either a 
chemical interaction (for example glutathione purification of GST). Alternatively, some carrier 
proteins, such as thioredoxin (Trx) can be selectively released from intact cells by osmotic shock or 
freeze/thaw procedures. Often, proteins that are fused to these carrier proteins can be purified away 
from intracellular contaminants by virtue of the physical attributes of the carrier protein (Ausubel et 
al., supra). 

To ensure that a fusion protein is useful, according to the invention, it may be necessary to 
modify the expression protocol to produce a soluble protein. Due to the fact that high-level expression 
of certain proteins can lead to the formation of inclusion bodies, if a soluble protein is required it may 
be necessary to modify the following variables. The temperature at which expression is induced can 
affect inclusion body formation since inclusion body formation is induced at higher temperatures (37°C 
and 42°C) and inhibited at lower temperatures (30°C). In certain instances, lowering the total level of 
protein expression can lead to an increase in the proportion of soluble protein that is produced. The 
strain background of the cells in which the protein is being produced can affect the proportion of a 
particular protein that is expressed in a soluble form. Furthermore, the choice of carrier protein can 
affect the solubility of an expressed fusion protein (Ausubel et al., supra). 
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An additional problem that can be encountered when producing fusion proteins in E. coli is 
formation of an unstable protein, or a protein that is cleaved at the site of the junction between the 
carrier sequence and the sequence of the protein of interest. To decrease complications due to protein 
instability one can arrange for the fusion protein to be expressed as insoluble aggregates. Alternatively, 
5 one can express the fusion protein in E. coli strains that are deficient in proteases (Ausubel et aL, 
supra). 

Often it is useful to remove the carrier protein moiety from the protein of interest to facilitate 
biochemical and functional analyses. Methods for cleavage of fusion proteins to remove the carrier 
are known to those skilled in the art. The choice of a method is usually determined by the composition, 

10 sequence, and physical characteristics of the particular protein. Reagents such as cyanogen bromide, 
hydroxylamine or low pH can be used to chemically cleave fusion proteins. To avoid complications 
resulting from chemical cleavage (e.g. the presence of chemical cleavage sites in the protein of 
interest and/or the occurrence of side reactions resulting in protein modification), enzymatic cleavage 
methods can be used. Enzymatic cleavage protocols are advantageous because they can be carried 

15 out under relatively mild reaction conditions, and because they involve highly specific cleavage 
reactions. Enzymes useful for enzymatic cleavage of fusion proteins include factor Xa, thrombin, 
enterokinase, renin and collagenase (Ausubel etal., supra). > 
Recombinant constructs encoding fusion proteins wherein the carrier sequence is on the order 
of 9-15 codons, can be generated by PGR methods. According to this method, a PCR primer will be ] 

20 designed to contain at least 13 nucleotides that are identical to the target sequence on either side of the 
nucleotide sequence encoding the carrier sequence. Preferably, the PCR primer will also contain a 
restriction enzyme site to facilitate cloning of the amplified product into an appropriate expression 
vector. PCR will be carried out as described above and the sequence of the amplified product will be 
confirmed by sequence analysis as described in Section D entitled "Isolation of a Wild type Gene". 

25 Alternatively, recombinant constructs encoding fusion proteins can be generated by 

site/oligonucleotide directed mutatagenesis (Ausubel et aL, supra). According to the method of site 
directed mutatagenesis the DNA to be mutated is inserted into a plasmid which has an Fl origin of 
replication. A mutagenesis oligonucleotide is designed to contain 13 bp that are 100% identical to the 
target sequence, on either side of a sequence coding for the 9-15 codons of carrier sequence that is to 

30 be added by the mutatgenesis protocol. 

A single stranded preparation of the vector is prepared by the following method. Following 
transformation of an appropriate bacterial strain (e.g. CJ236) with the recombinant plasmid and plating 
of the bacteria on LB agar plates, a single resulting colony is grown in 4x5 ml of LB plus ampicillin for 
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1 hour at 37°C with vigorous shaking. M13K07 helper phage (2 ml, approximately lO^-lO 11 plaque 
forming units) is added and the bacteria are grown for an additional hour at 37°C with vigorous 
shaking. Following the addition of 7 ml of kanamycin (50 mg/ml), the bacteria are grown overnight at 
37°C with vigorous shaking. The following day bacterial cultures are pooled and cells are separated by 
centrifugation. After the addition of 2.6 ml of 20% polyethylene glycol 200-800/2M NaCl to 20 ml of 
bacterial supernatant, the sample is incubated for 1 - 1.5 hours on ice. The sample is pelleted by 
centrifugation at 9000 rpm for 20 minutes. Following removal of the supernatant, residual supernatant 
are removed by centrifugation at 3000 rpm for 5 minutes. The pellet is resuspended in 400 ml of TE, 
extracted twice with phenol and four times with phenohchloroform and ethanol precipitated. The 
resulting pellet is resuspended in 40 ml TE. 

Mutagenesis is performed by using a muta-gene kit (Bio-Rad, Hercules, CA) according to the 
following method. To kinase the oligonucleotide primer, 1 ml (200ng) of oligonucleotide is incubated in 
the presence of 2 ml of 10 kinase buffer (0.5M Tris, pH 8.0, 70mM MgCl^, lOmM DTT), 2 ml 
lOmM rATP, 2 ml polynucleotide kinase and 13 ml H 2 0 for 37°C for 1 hour. To carry out the 
annealing and synthesis steps, 2.5 ml of single-stranded template are mixed with 1 ml of kinased 
oligonucleotide, 1.0 ml of 10X annealing buffer (200mM Tris^HCl, pH 7.4, 20 mM MgCl 2 , 500mM 
NaCl) and 5.5 ml H^O for 10 min at 65°C. The reaction mixture is slow-cooled to 37°C. Once the 
sample has reached 37°C, the sample is spun briefly in a microfuge. Following the addition of 1.0 ml 
of 10X synthesis buffer (5mM each dATP, dCTP, cGTP, dTTP, lOmM ATP, lOOmM Tris-HGl, pH 
7.4, 50 mM MgCL,, 20mM DTT), 1.0 ml T4 DNA ligase and 0.5 ml of T4 DNA polymerase, the 
sample is incubated for 5 minutes on ice, 5 minutes at room temperature and 1 hour at 37°C. A 2 ml 
aliquot of the sample is used to transform E. coll 

DNA is isolated from the transformed E. coli cells by mini prep methods known in the art 
(Ausubel et al., supra), and sequenced according to methods known in the art (described in Section D 
entitled "Isolation of a Wild Type Gene". 

C. Production of a Nucleic acid Probe 

The invention discloses nucleic acid probes. Preferably, the nucleic acid probes of the 
invention are specifically hybridizable to a mutant gene but not to a wild type form of a gene due to the 
presence of one or more polymorphisms. These allele specific probes can be used to screen DNA 
sequences of a gene which have been amplified by PGR, or are present in a genomic DNA or RNA 
test sample. Hybridization of a particular allele specific probe to an amplified gene sequence, under 
stringent conditions (described below), indicates that the polymorphism contained in the probe is 
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present in the amplified sequence. Hybridization of a particular allele specific probe to a test sample 
comprising genomic DNA or RNA, under stringent conditions (described below), indicates that the 
polymorphism contained in the probe, is present in the nucleic acid of the test sample. Nucleic acid 
probes that are specifically hybridizable to a wild type form of a gene but not to a mutant form of a 
gene are also useful according to the invention. 

In another embodiment, the probes of the claimed invention will be specific for a nucleic acid 
region that is adjacent to a region that is thought to contain one or more polymorphisms. These probes 
will be useful for detecting the presence of one or more polymorphisms in the adjacent region by the 
method of primer extension (as described in Section F entitled "Identification and Characterization of 
Polymorphisms". 

In other embodiments, probes of the claimed invention will be used to detect a gain or loss of a 
restriction enzyme site known to contain one or more polymorphisms of the claimed invention. Nucleic 
acid probes, according to this embodiment, are able to detect a restriction enzyme fragment that is of a 
size that can be easily separated on an agarose gel and visualized by Southern blot analysis. Probes 
that are useful according to this embodiment of the claimed invention can be specific for any region 
within a gene or outside of a gene. 

The nucleic acids probes of the invention are useful for a variety of hybridization-based 
analyses including but not limited to Southern hybridization to genomic DNA, cDNA sequences or ; 
PCR amplification-products, Northern hybridization to mRNA and RNase protection assays, DNA *i 
sequencing and isolation of genomic or cDNA clones of a gene. The probes may also be used to 
determine whether mRNA encoded for by a gene is present in a cell or tissue by the method of in situ 
hybridization. These techniques are well known in the art and can be performed as described in 
Ausubel et al., supra. 

According to the methods of the above-referenced hybridization assays, polymorphisms 
associated with alleles of a gene, which either predispose to a particular disease (e.g. osteoarthritis) or 
are not associated with a particular disease (e.g. osteoarthritis), will be detected by the formation of a 
stable hybrid consisting of a polynucleotide probe comprising one or more polymorphisms and a target 
sequence, that also comprises one or more polymorphisms, under stringent to moderately stringent 
hybridization and wash conditions. If it is expected that the probes will be perfectly complementary to 
the target sequence, stringent conditions will be used. Hybridization stringency maybe lessened if 
some mismatching is expected, for example, if variants are expected with the result that the probe will 
not be completely complementary. Conditions are chosen which rule out nonspecific/adventitious 
bindings, that is, which minimize noise. Since such indications identify neutral DNA polymorphisms as 



26 



WO 03/054166 



PCT/US02/41225 



well as mutations, these indications need further analysis (such as assays described in Section F 
entitled "Identification and Characterization of Polymorphisms") to demonstrate detection of a 
susceptibility allele of a gene. 

Probes for alleles of a gene may be derived from genomic DNA or cDNA sequences from 
specific for the gene of interest. The probes may be of any suitable length, which span all or a portion 
of the region containing the gene. If the target sequence contains a sequence identical to that of the 
probe, the probes may be short, e.g., in the range of about 8-30 base pairs, since the hybrid will be 
relatively stable under even stringent conditions. If some degree of mismatch is expected with the 
probe, i.e., if it is suspected that the probe will hybridize to a variant region, a longer probe may be 
employed which hybridizes to the target sequence with the requisite specificity. 

Probes according to the invention also include an isolated polynucleotide attached to a label or 
a reporter molecule which may be useful for isolating other polynucleotide sequences, having 
sequence similarity by standard methods, including but not limited to the above-referenced 
hybridization-based assays. Techniques for preparing and labeling probes (as described in Ausubel et 
al. Supra) are included below. A wide variety of labels and conjugation techniques are known by those 
skilled in the art and can be used in a various nucleic acid and amino acid assays. Means for producing 
labeled hybridization or PGR probes for detecting related sequences include oligolabeling, nick 
translation, end-labeling or PGR amplification using a labeled nucleotide. Alternatively, the protein- 
encoding sequence, or any portion of it, may be cloned into a vector for the production of an mRNA 
probe. Such vectors are known in the art, are commercially available, and may be used to synthesize 
RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3 or SP6 and 
labeled nucleotides. 

A number of companies such as Pharmacia Biotech (Piscataway NJ), Promega (Madison 
WI) and US Biochemical Corp (Cleveland OH) supply commercial kits and protocols for these 
procedures. Suitable reporter molecules or labels include those radionuclides, enzymes, fluorescent, 
chemiluminescent, or chromogenic agents as well as substrates, cofactors, inhibitors, magnetic 
particles and the like. Patents teaching the use of such labels include US Patents 3,817,838; 3,350,752; 
3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241. Also, recombinant immunoglobulins maybe 
produced as shown in US Patent No. 4,816,567 incorporated herein by reference. 

Probes comprising synthetic oligonucleotides or other polynucleotides of the present invention 
maybe derived from naturally occurring or recombinant single- or double- stranded polynucleotides, or 
be chemically synthesized. 

Portions of the polynucleotide sequence having at least approximately 5 nucleotides, 
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preferably 9-15 nucleotides, fewer than about 6 kb and usually fewer than about 1 kb, from a 
polynucleotide sequence encoding a gene are preferred as probes. 

A DNA probe useful according to the present invention can be isolated from a gene or a 
polynucleotide constmct derived from a gene, or from a cDNA sequence specific for a gene or a 
cDNA construct specific for a gene by the methods of PCR or restriction enzyme digestion, as 
described above. Riboprobes useful according to the invention can be synthesized by the method of in 
vitro transcription, or by chemical synthesis methods, as described above. 

An oligonucleotide probe useful according to the invention can be designed, as described 
above, and synthesized in a commercially available automated synthesizer. 

Nucleic acid hybridization rate and stability will be affected by a variety of experimental 
parameters including salt concentration, temperature, the presence of organic solvents, the viscosity of 
the hybridization solution, the base composition of the probe, the length of the duplex, and the number 
of mismatches between the hybridizing nucleic acids (Ausubel et al, supra), and as described in 
Section A entitled "Design and Synthesis of Oligonucleotide Primers". 

Southern blot analysis can be used to detect sequence variations in a gene from a PCR 
amplified product or from a total genomic DNA test sample via a non-PCR based assay. The method 
of Southern blot analysis is well-known in the art (Ausubel et al., supra, Sambrook et at, 1989, 
Molecular Cloning. A Laboratory Manual, 2nd Edition , Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, NY). This, technique involves the transfer of DNA fragments from an electrophoresis 
gel to a membrane support resulting in the immobilization of the DNA fragments. The resulting 
membrane carries a semipermanent reproduction of the banding pattern of the gel. 

Southern blot analysis is performed according to the following method. Genomic DNA (5-20 
mg) is digested with the appropriate restriction enzyme and separated on a 0.6-1.0% agarose gel in 
TAE buffer. Hie DNA is transferred to a commercially available nylon or nitrocellulose membrane 
(e.g. Hybond-N membrane, Amersham, Arlington Heights, IL) by methods well known in the art 
(Ausubel et al., supra, Sambrook et al, supra). Following transfer and UV cross linking, the membrane 
is hybridized with a radiolabeled probe in hybridization solution (e.g. under stringent conditions in 5X 
SSC, 5XDenhardt solution, 1% SDS) at 65°C. Alternatively, high stringency hybridization can be 
performed at 68°C or in a hybridization buffer containing a decreased concentration of salt, for 
example 0.1X SSC. The hybridization conditions can be varied as necessary according to the 
parameters described in Section A entitled "Design and Synthesis of Oligonucleotide Primers". 
Following hybridization, the membrane is washed at room temperature in 2X SSC/0.1% SDS and at 
65°C in 0.2X SSC/0.1% SDS, and exposed to film. The stringency of the wash buffers can also be 



28 



WO 03/054166 



PCT/US02/41225 



varied depending on the amount of the background signal (Ausubel et al., supra). 

Detection of a nucleic acid probe-target nucleic acid hybrid will include the step of hybridizing 
a nucleic acid probe to the DNA target. This probe may be radioactively labeled or covalently linked 
to an enzyme such that the covalent linkage does not interfere with the specificity of the hybridization. 
5 A resulting hybrid can be detected with a labeled probe. Methods for radioactively labeling a probe 
include random oligonucleotide primed synthesis, nick translation or kinase reactions (see Ausubel et 
al., supra). Alternatively, a hybrid can be detected via non-isotopic methods. Non-isotopically labeled 
probes can be produced by the addition of biotin or digoxigenin, fluorescent groups, chemiluminescent 
groups (e.g. dioxetanes, particularly triggered dioxetanes), enzymes or antibodies. Typically, non- 
10 isotopic probes are detected by fluorescence or enzymatic methods. Detection of a radiolabeled 

probe-target nucleic acid complex can be accomplished by separating the complex from free probe 
and measuring the level of complex by autoradiography or scintillation counting. If the probe is 
covalently linked to an enzyme, the enzyme-probe-conjugate- target nucleic acid complex will be 
isolated away from the free probe enzyme conjugate and a substrate will be added for enzyme 
15 detection. Enzymatic activity will be observed as a change in color development or luminescent output 
resulting in a 10 3 -10 6 increase in sensitivity. An example of the preparation and use of nucleic acid 
probe-enzyme conjugates- as hybridization probes (wherein the enzyme is alkaline phosphatase) is r= 
described in (Jablonski et al., 1986, Nucleic Acids Res. , 14:61 15) 

Two-step label amplification methodologies are known in the art. These assays are based on 
20 the principle that a small ligand (such as digoxigenin, biotin, or the like) is attached to a nucleic acid 
probe capable of specifically binding to a gene. Allele specific gene probes are also useful according 
to this method. 

According to the method of two-step label amplification, the small ligand attached to the 
nucleic acid probe will be specifically recognized by an antibody-enzyme conjugate. For example, 

25 digoxigenin will be attached to the nucleic acid probe and hybridization will be detected by an antibody- 
alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent 
substrate. For methods of preparing nucleic acid probe-small ligand conjugates, see (Martin et al., 
1990, BioTechniques , 9:762). Alternatively, the small ligand will be recognized by a second ligand- 
enzyme conjugate that is capable of specifically complexing to the first ligand. A well known example 

30 of this manner of small ligand interaction is the biotin avidin interaction. Methods for labeling nucleic 
acid probes and their use in biotin- avidin based assays are described in Rigby et al, 1977, J. Mol. Biol., 
113:237 and Nguyen et al., 1992, BioTechniques , 13:116). 

Variations of the basic hybrid detection protocol are known in the art, and include 
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modifications that facilitate separation of the hybrids to be detected from extraneous materials and/or 
that employ the signal from the labeled moiety. A number of these modifications are reviewed in, e.g., 
Matthews & Kricka, 1988, Anal. Biochem. , 169:1; Landegren et aL, 1988, Science , 242:229; Mitflin, 
1989, ClincalChem. 35:1819; U.S. Pat. No. 4,868,105, and inEPO Publication No. 225,807. 

5 

D. Isolation of a Wild type gene 

A wild type version of a candidate gene according to the invention can be isolated by cloning 
from an appropriately selected genomic library according to methods well known in the art. Methods 
of cloning are described in Section B entitled "Production of a Polynucleotide Sequence 

10 The sequence of the cloned gene will be determined by sequencing methods well known in the 

art (see Ausubel et al., supra and Sambrook et aL, supra). Methods of sequencing employ such 
enzymes as the Klenow fragment of DNA polymerase I, Sequenase® (US Biochemical Corp, 
Cleveland, OH), Taq polymerase (Perkin Elmer, Norwalk, CT), thermostable T7 polymerase 
(Amersham, Chicago, EL), or combinations of recombinant polymerases and proofreading 

15 exonucleases such as the ELONGASE Amplification System (Gibco BRL, Gaithersburg, MD). 

Preferably, the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton. 
Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA • : 
sequencers (Perkin Elmer). 

20 E. Isolation of a Mutant Gene 

A mutant version of a candidate gene according to the invention can be isolated by cloning 
from an appropriately selected genomic library according to methods well known in the art. Methods 
of cloning are described in Section B entitled "Production of a Polynucleotide Sequence." 

Hie sequence of the cloned gene will be determined by sequencing methods described in 
25 Section D entitled "Isolation of a Wild Type Gene." 

F- Identification and Characterization of Polymorphisms 

a. Identification of SNPs by in silico methods (isSNPs) 
1. Identification of Polymorphisms in Candidate Genes 
30 The starting point is a set of experimentally derived nucleic acid sequences. In order to be 

useful for SNP discovery by the invention, it is preferred that the sequences have complete 
chromatogram files from a gel or capillary electrophoresis sequencing machine. When this is not 
available, quality score data which assigns a score to each base in the sequence indicating the 
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likelihood of error for the basecall may be used. If neither of these data are available, the sequence 
may be used to assist the clustering of other sequences and in some cases to provide additional 
verification for a discovered SNP, but is not be used by the invention for the identification of the 
polymorphism. 

5 The population of sequences used may constitute either a database of cDNA-derived 

sequences or genomic sequence. In a preferred embodiment, sequences used by the invention are 
from an assembled cDNA database, such as the LifeSeqGold database (Incyte Genomics, 
Inc(Incyte), Palo Alto, CA). 

10 Derivation of Nucleic Acid Sequences 

cDNA was isolated from libraries constructed using RNA derived from normal and diseased 
human tissues and cell lines. The human tissues and cell lines used for cDNA library construction 
were selected from a broad range of sources to provide a diverse population of cDNAs representative 
of gene transcription throughout the human body. Descriptions of the human tissues and cell lines 

15 used for cDNA library construction are provided in the LIFESEQ database (Incyte Pharmaceuticals, 
Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, 
cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, • v> 
musculoskeletal, neural, reproductive, and urologic sources. 

Cell lines used for cDNA library construction were derived from, for example, leukemic cells, ' 

20 teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. Such 
ceU lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines 
commonly used and available from public depositories (American Type Culture Collection, Manassas 
VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 
5 -aza-2 -deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of 

25 leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress. 

Sequencing of the cDNAs 

Methods for DNA sequencing are well known in the art. Conventional enzymatic methods 
employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. 
30 Biochemical Corporation, Cleveland OH), Taq polymerase (The Perkin-Elmer Corporation (Perkin- 
Elmer), Nor walk CT), thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham 
Pharmacia Biotech), Piscataway NJ), or combinations of polymerases and proofreading exonucleases 
such as those found in the ELONGASE amplification system (Life Technologies Inc. (Life 
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Technologies), Gaithersburg MD), to extend the nucleic acid sequence from an oligonucleotide primer 
annealed to the DNA template of interest. Methods have been developed for the use of both single- 
stranded and double-stranded templates. Chain termination reaction products may be electrophoresed 
on urea-polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled 
5 nucleotides) or by fluorescence (for fiuorophore-labeled nucleotides). Automated methods for 

mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have 
been developed. Machines used to prepare cDNAs for sequencing can include the MICROLAB 
2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler 

(PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal 
10 cycler (Perkin-Elmer). Sequencing can be carried out using, for example, Hie ABI 373 or 377 

(Perkin-Elmer) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale 

CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the 

art. 

The nucleotide sequences have been prepared by current, state-of-the-art, automated methods 
15 and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified 
nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance 
to practicing the invention for those skilled in the art. Several methods employing standard 
recombinant techniques maybe used to correct errors and complete the missing sequence information. 
( See? e g m9 those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology , John 
20 Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory 
Manual , Cold Spring Harbor Press, Plainview NY.) 

Assembly of cDNA Sequences 

Human polynucleotide sequences maybe assembled using programs or algorithms well known 

25 in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single 
or many different transcripts. Assembly of the sequences can be performed using such programs as 
PHRAP (Phils Revised Assembly Program) and the GEL VIEW fragment assembly system (GCG), 
or other methods known in the art. 

Alternatively, cDNA sequences are used as "component" sequences that are assembled into 

30 "template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, 
and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway 
known as Block 1 (See, e.g., the LIEESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo 
Alto, CA). A series of BLAST comparisons is performed and low-information segments and 
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repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to 
prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The 
processed sequences are then loaded into a relational database management system (RDMS) which 
assigns edited sequences to existing templates, if available. When additional sequences are added into 

5 the RDMS, a process is initiated which modifies existing templates or creates new templates from 
works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences 
themselves. After the new sequences have been assigned to templates, the templates can be merged 
into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated. 

A resultant template sequence may contain either a partial or a full length open reading frame, 

10 or all or part of a genetic regulatory element. This variation is due in part to the fact that the full 

length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. 
With current technology, cDNAs comprising the coding regions of large genes cannot be cloned 
because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second 
strand" synthesis. Template sequences maybe extended to include additional contiguous sequences 

15 derived from the parent RNA transcript using a variety of methods known to those of skill in the art. 
Extension may thus be used to achieve the full length coding sequence of a gene. 

Analysis of the cDNA Sequences 

The cDNA sequences are analyzed using a variety of programs and algorithms which are 
20 well known in the art. (See, e.g., Ausubel, supra , Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecular 
Biology and Biotechnology , Wiley VCH, New York NY, pp. 856-853). These analyses comprise both 
reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, 
J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and 
homology searches. 

25 Computer programs known to those of skill in the art for performing computer-assisted 

searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local 
Alignment Search Tool (BLAST; Altschul, S.R (1993) J. Mol. Evol. 36:290-300; Altschul, S.F.et al. 
(1990) J. Mol. Biol. 215:403-410.) BLAST is especially useful in determining exact matches and 
comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal 

30 and for which the alignment score meets or exceeds a threshold or cutoff score set by the user 

(Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845.) Using an appropriate search tool 
(e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases maybe 
searched for sequences containing regions of homology to a query rbosm or RBOSM of the present 
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invention. 

Other approaches to the identification, assembly, storage, and display of nucleotide and 
polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information/' 
U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence 

5 Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for 

Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, 
all of which are incorporated by reference herein in their entirety. 

Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, 
BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in 

10 "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence 
Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference. 

Identification of Sequence Variants and Polymorphisms 

The method comprise a series of filters to identify isSNPs from other sequencing variants and 
15 errors. The filters can be grouped into the following five sets of filters by the order of application in 
the method: 

Preliminary Filters: the main filter in the first group removes die majority of base call errors by 
requiring a minimum phred quality score of 15. Additional filters at this stage deal with sequence 
alignment errors as well as errors resulting from improper trimming of vector sequence, chimeras and 
20 splice junctions. 

Advanced Chromatogram Analysis: additional base call errors are then detected by examining 
the original chromatogram files in the vicinity of a putative SNP by an automated procedure resulting 
in a set of SNPs wherein the base call error rate is reduced to less than 5%. 

Clone Error Filters: errors introduced during laboratory processing such as those caused by 
25 reverse transcriptase, polymerase or somatic mutation are among the most difficult to distinguish from 
true SNPs. The Clone Error filters use statistically generated algorithms to identify these sources of 
error. A small percentage of actual SNPs will be discarded at this stage. 

Clustering Error Filters: these types of errors result from the incorrect clustering of close 
homologs, pseudo- genes or from contamination by nonhuman sequences. The filters developed to 
30 minimize these clustering errors are also statistically based. As above these filters may be reject a 
fraction of actual SNPs 

Finishing Filters: these filters remove duplicate and redundant SNPs from the generated list of 
SNP, and remove SNPs which are from the hypervariable regions of hypervariable genes such as 
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immunoglobulin and T cell receptors. 
Pre-processing steps 

The sequences must first be trimmed to eliminate vector sequence, contamination and 

5 repetitive sequences. Then certain low information content sequences (for example, long runs of a 
single base, or two or three-base repeats) and repetitive sequences (for example Alu sequences in 
humans) must be massed (changed to N's) to prevent over-clustering errors. The clustering process 
then identifies the sets of sequences that are believed to be derived from the same original DNA 
sequence or gene. The sequences in each cluster or then aligned using a method such as phrap which 

10 also defines a consensus sequence. It will be well recognized by those skilled in the art that there are 
numerous existing programs for carrying out these processes, and the SNP discovery process 
described herein will work equally well with any of them. In the instant embodiment, the preferred 
processes are Blocked 1 for trimming and masking, a variety of different algorithms for clustering, and 
phrap for the alignment. It will be recognized by those skilled in the art that phrap and other alignment 

15 methods carry out a secondary clustering step which divides clusters into contigs, and carry out a 
secondary trimming step which defines the end points of the portion of each sequence which 
participates in the contig. The contigs then maybe searched for the occurrence of SNPs. 

Errors in the trimming, clustering and alignment processes will cause SNP discovery errors, 
usually false positives (the prediction of SNPs where they do not exist). Additional filters which are 

20 the subject of the invention are designed to recognize and remove these errors by providing the ability 
to identify likely errors in the processes and to correct them. 

In some instances, it is preferred, as an optional step, to unmask regions of sequences which 
were masked because of low information content or repetitive sequence) during the clustering process 
can be unmasked after clustering to allow discovery of SNPs within these regions. 

25 

Identification of Candidate SNP Sequences 

The first step in identifying candidate SNP sequences is to redefine the end points of each 
sequence as the points within the previous end points where a stretch of at least 10 consecutive base 
calls, containing at least eight base changes, matches the consensus sequence exactly. Sequence 
30 trimming errors (both at single sequence stage and at the alignment stage contribute to the false 

positives when foreign sequence (vector, chimera or splice variant) is similar to the real sequence and 
the true boundary is difficult to determine. This step is a conservative approach to avoid false 
positives and also filters out lower-quality sequence that the ends. The reason the length of the match 
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with a consensus is measured in base changes is to avoid low significance matches on repetitive 

sequence such as polyA. 

The next step is an each position of the alignment to compare the base calls of all the aligned 

sequences which are between their stall: and end positions and which have quality scores greater than 
5 a set threshold, and which have neighboring base calls which agree with a consensus sequence and 

where the neighboring base calls also have a quality score > the threshold. Preferably the threshold is 

a pined quality score greater than or equal to 15. The possibilities are A, C, G, T, and -(deletion). 
The next step is a Clone Filter where if there has been more than one base call for a 

sequence position, then the clone for each sequence is identified in the sequences corresponding to 
10 each clone are compared. If the base calls for different sequences from the same clone disagree, 

then all the sequences for this clone at this base position are removed from consideration. 

After all of these filters, positions for which, there is more than one base call are candidate 

SNPs. The "wild type" base call is the one in the consensus sequence and the others are designated 

candidate SNPs. If the wild type base call is a deletion, then the SNP is considered to be an insertion 
15 at the previous base. 

Automated Chromatogram Checking 

The next filters require opening of the chromatogram files for the sequences identified as 

containing candidate SNPs. At each candidate SNP position, the chromatogram data of each 
20 sequence passing the Identification Filters is extracted. The first step in this process utilizes a program 

ABIdump to translate binary ABI chromatogram files into usable form. 

Multiple Base Call Algorithm filter: the ABI base calls for each sequence are compared to the 

phred base calls. If the base calls do not agree at the SNP position and the two adjacent flanking 

positions, then the sequences are removed from consideration. 
25 Intensity Filter: if the SNP is a single base change (this step is skipped for insertions and 

deletions), then the process intensity values for each of four bases at the call chromatogram location 

of the candidate SNP base are used to compute a ratio. If we call the intensity of wild type, M wt", the 

intensity of the SNP base "sup", the minimum of the other two "min", and the phred quality of the base 

call "Q", then the wild type sequences must have 
30 (snp-min) < (wt-min)(Q-17)/37 and Q>=17 to be considered high-quality, and 

(snp-min)<(wt-min)(Q-4)/37 and Q>=15 to be considered a low quality pass. 

The basis for these formula is that if a base is mis- called, then there is likely to be a residual peak for 
the correct base. The larger the peak for the wild type base, the less likely that the call of the SNP is 



36 



WO 03/054166 



PCT/US02/41225 



correct. The actual thresholds in the formula are based on empirical data from clones which were 
sequence multiple times and which gave a set of confirmed SNPs and error rates for algorithm 
optimization. 

The candidate SNP passes only if at least one wild type sequence passes and at least one 
SNP sequence passes. The quality of the candidate SNP is the lower of the highest wild type pass 
level and the highest SNP pass level (if there is a high-quality wild type sequence but only low quality 
SNP sequences, then the candidate is low quality. A SNP quality value is returned. 

Clone Error Quality Filters (somatic mutation/reverse transcript ase/polymerase errors) 

The purpose of these filters is to remove errors which are actually in the clone, that is, the 
clone sequence was correct but the clone does not represent the individual being sequenced. Three 
possible sources of these errors are somatic mutations, errors made by reverse transcriptase in the 
process of making cDNA, and DNA polymerase errors in those situations where the DNA has been 
amplified by PGR at some point prior to inserting in the cloning vector. Somatic mutations, can be a 
particular problem in sequencing clones derived from cell lines. 

Polymerase errors are specific to the type of sequencing protocol used. For example, reverse 
transcriptase is involved in EST sequencing but not genomic clone sequencing. Polymerase is involved 
in the creation of extension clones (polymerase is used in all sequencing reactions, but errors are less 
likely to arise because only a fraction of the templates are affected in contrast to the extension 
process where a single polymerase product becomes a template for the entire reaction): This filter is 
not applied to genomic sequences in the current embodiment on the premise that the genomic 
sequences do not have polymerase errors, and that somatic mutations are likely to have the same 
profile as real SNPs. 

This filter also filters out rare SNPs as well as apparent SNPs which are not real. It is 
difficult to determine and confirm by experiments to what extent SNP candidates are too rare to be 
confirmed vs. simply not real. For many applications, very rare SNPs are of less utility than common 
ones such that this is not a problem; however in some applications it may be advisable to turn this filter 
off. 

Base change sequence analysis filter 

The premise of this filter is that probabilities of different mutations is different depending on 
the source. For example true SNPs may be mostly transitions whereas reverse transcriptase 
mutations could be primarily G to T mutations. While this does not allow one to determine for sure 
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that a given change is a true SNP, it allows one to evaluate the relative likelihood that a given mutation 
is a true SNP. SNP confirmation data suggest that G/T SNP candidates in which there is only one 
clone having the T allele have a very low probability of being real SNPs. The SNP candidates are 
excluded from the high confidence set (they are kept in a different file-their confirmation rate is well 
5 below 50 percent). The other set which had a very low confirmation rate is any A/T SNP. 

Frequency Filter 

This filter is based on the concept that true SNPs have a different frequency profile than 
clone errors and that a candidate SNP which is evident in only one clone in a deep alignment is less 

10 likely to be real than one which appears in one clone in a shallow alignment. The likelihood of finding 
a SNP at a given sequence location is a function of the number of chromosomes sequenced. This 
curve is distinctly non-linear as most SNPs are sufficiently frequent, to be found with relatively few 
sequences. The probability of an error of this type, however is essentially linear in the number of 
sequences since the chance of the change occurring in two different sequences is independent. This 

15 means that the probability that a candidate SNP observed in a single clone is a true SNP is lower if the 
alignment is deep then if a is shallow. Any SNP occurring in a single clone in an alignment of more 
than 20 clones (counting only high-quality sequences which have a chance of contributing a candidate 
SNP) is excluded from the high confidence set. 

This filter is the basis of a secondary method used to develop the base change sequence 

20 analysis filter. Comparing the set of single clone SNPs from shallow alignment's with those from deep 
alignment's, which are more likely to be errors, will reveal base changes which are more likely to be 
associated with polymerase errors and somatic mutations. 

Clustering Error Filters 

25 These filters are intended to remove candidates SNPs which result from the incorrect 

clustering of similar sequences such as highly homogenous genes, similar genomic sequences, and 
contamination from other species where the sequences of the species have been mis- labeled as 
human. 

30 Number of base change filter 

This filter distinguishes homologous sequences from SNPs on the basis of the frequency of 
variants. True SNPs occur about one per kd when comparing to sequences or once per 2 kb if the 
length of sequences is included, and this fraction decreases as the depth of the alignment increases. 
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Since EST sequences tend to be about 500 bp or less in length, then it would be expected to have not 
more than one SNP per four sequences. The number of SNPs in the cluster is divided by the number 
of sequences in the cluster and SNPs for which this number is larger than one are discarded. The 
higher the number, the less likely the SNP is to be real. The threshold value of one was chosen 
5 because it appears to correspond to roughly a 50 percent success rate, however the threshold value 
could be adjusted to higher value to accept lower confidence SNPs. 

Distance from next polymorphism filter 

This filter calculates the number of SNPs for which the sequence is the only representative 
10 within a window of 100 bases on either side, and discards any of the SNPs for which there are more 
than one other SNP in this window. This threshold can be set higher, but the actual fraction of SNP 
candidates which are true SNPs drops off to less than 50 percent. 

Haplotvpe clustering filter 

15 When sequences from different sources are inappropriately clustered, it is possible to divide 

them into two or more clusters which are consistent. la particular, if we take any two differences 
: between homologs and consider the haplotypes of the clones which overlap both SNPs, there are only 
two haplotypes. In other words, a 2x2 matrix of haplotypes is diagonal having only two non-zero 
entries. If there are only two sequences, then this is expected. For each SNP, a 2x2 haplotype matrix \ 

20 with each other SNP is computed. If it is diagonal, and there are more than two sequences, than the 
sum of the diagonal elements minus one is a "cluster total" for this SNP. This "cluster total" number 
has proven to be empirically correlated with the confirmation rate, probably because it predicts 
clusters which contain para-logs, homologs and contamination from other species. Candidates SNPs 
which have a cluster number of less than eight are kept. This threshold value for the cluster total can 

25 be varied. 

Redundancy/finishing filters 

Redundant SNP filter: SNPs in different contigs of the same gene which have the same base 
change and surrounding sequence are flagged as redundant. To accommodate possible splice variants 
30 this redundancy filter also applies to SNPs which have the surrounding sequence matches on only one 
side. 

T cell receptor/immunoglobulin filters 



39 



WO 03/054166 



PCT/US02/41225 



Sequences containing SNPs are filtered to remove SNPs in sequences that are homologs to T 
cell receptors and immunoglobulin genes because both types of genes have hyper-variable regions 
which could result in false positives. 

5 Output file 

SNP related data: With each candidate SNP a variety of data is kept, including the number and 
sources of all contributing sequences (for example gene album, HTPS, FL, WashU/Merck, etc.), the 
surrounding sequence, measures of the ratio and quality scores for the "best" sequence representing 
10 each allele, etc. 

Sequence related data: for each sequence associated with each SNP, the following data is kept 
including the distance in each direction to the end of the sequence, the distance in each direction to the 
next base different from the consensus and passing the initial quality filters, the library, tissue ED, 
15 donor ID and comments (for example tumor, diseases, normal). 

: » These methods have been described in patent applications entitled "Method for the 
Identification of Sequence Polymorphisms using Polynucleotide Sequence Databases, and Single 
Nucleotide Polymorphisms Identified Thereby" (Attorney Docket Nos. GX-0006 P and GX-0010 P), 
20 and are hereby incorporated by reference. 

b. Identification of polymorphisms in osteoarthritis associated genes by SSCP 
The invention provides methods for detecting the presence of polymorphisms in candidate 
genes of the invention. The invention also provides methods for distinguishing polymorphisms which 
25 contribute to a particular disease (e.g. osteoarthritis) over polymorphisms which do not contribute to 
the disease. 

1. Identification of Polymorphisms in Candidate Genes 

Identification of polymorphisms in a candidate gene, according to the invention, will involve the 
steps of isolating the candidate gene, determining its genomic structure and identifying polymorphisms 
30 in the DNA sequences in any portion of the entire protein-coding region. The invention also provides 
methods for identifying polymorphisms in the DNA sequences corresponding to RNA splice junctions. 
The invention also provides methods for identifying polymorphisms in the DNA sequence 
corresponding to the regulatory (promoter) region of the candidate gene. 
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A candidate gene is isolated by cloning methods well known in the art. (described above). 
Preferably die genomic structure of a candidate gene is determined by Southern blot analysis, as 
described in Section C. It is expected that the entire sequence of an open reading frame (ORF) of an 
average entire gene can be spanned by 16 PCR~amplified DNA fragments or amplimers of an 
average length of 225 bp. It is expected that a smaller gene can be spanned by 1-2 amplimers and that 
>50 amplimers are required to span extremely large genes. Primers useful for production of the 
amplimers of a particular candidate gene are designed based on preexisting knowledge of the 
sequence of the wild type gene, according to the primer design strategies described in Section A 
entitled "Design and Synthesis of Oligonucleotide Primers." 

For PCR amplification of a region to be tested by SSCP it is preferable to design primers that 
amplify overlapping regions of the candidate gene. If a sequence variation is located in a region of a 
candidate gene that corresponds to the region to which the primers hybridize, the primers will likely not 
bind, the region cont aining this sequence variation will not be amplified and the variation will not be 
detected in PCR based assays. By producing overlapping amplimers it is expected that virtually all of 
the sequence variations in a particular candidate gene will be detected. The amount of overlap in the 
amplimers is somewhat variable (approximately 20%) and the precise location of the overlapping 
regions will depend on the location of regions comprising a sequence that is an appropriate primer - 
sequence. It is a possibility that a polymorphism will be located at a position just adjacent to the primer 
site. Consequently, sequence information will be available for only 20 bp on one side of the 
polymorphism and for 104-279 bp on the other side of the polymorphism. However, this should be a 
sufficient amount of sequence information to allow definition of a unique sequence context in which to 
define the particular polymorphism. 

Based on screening analysis of 92 samples (184 chromosomes), it is expected that about 50% 
of the amplimers will demonstrate polymorphisms, and that approximately 80% of these amplimers will 
detect changes at single positions while the remaining 20% will detect base changes at two positions. 
Based on these estimates, it is expected that there will be approximately 10 sequence variations per 
open reading frame. However, the number of amplimers that demonstrate polymorphisms with vary 
depending on the number of individuals tested, the ethnicity and structure of the population being 
tested, and the region of DNA being tested. 

Preferably, each polymorphism will be detected in the context of an SSCP fragment. 
Polymorphism analysis by fluorescent SSCP (fSSCP, described in detail in Section F entitled 
"Identification and Characterization of Polymorphisms") uses PCR to generate an amplimer of DNA 
to be studied. The region to be tested is defined as the region between the primers (e.g. the region that 
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is incorporated into the PCR product and reflects the sequence of the DNA sample being tested). The 
PCR primers reflect the sequence of the DNA sample being tested and are incorporated into the PCR 
product as one end of each strand of DNA in the PCR product. If a polymorphism occurs in a primer 
binding site either the PCR primer does not bind due to the mismatch and the PCR will not produce a 
5 product, or the primer binds, an amplification step occurs wherein the primer is incorporated, but the 
amplified product does not contain the polymorphism which occurs at the primer binding site. 
Therefore, fSSCP provides a method of screening a DNA sequence located between PCR primers 
for the presence of polymorphisms. 

The sensitivity of the technique of fSSCP for detecting a polymorphism is affected by length, 
10 such that there is a substantial decrease in the detection of polymorphisms in amplimers that are 

greater than 300 bp in length. However, different conditions for performing SCCP at high sensitivity 
with larger fragments, e.g. 800-1500 bp have also been described. If the length of DNA screened per 
amplimer is decreased then more amplimers are required to screen a region of a given size. Therefore, 
efficient screening of a gene dictates that the lower limit of the size of an amplimer is 125 bp. To 
15 attain specificity for a particular gene sequence, pnmers are usually 20-25 bp in length, and additional 
criteria such as G:C content, and intra- and inter-primer complementarity are important considerations 
in primer design (as described above). All of these considerations are addressed if the primer3 
program (Copyright (c) 1996 Whitehead Institute for Biomedical Research) is employed to design 
pairs of primers suitable for use in a single PCR reaction. Typically, program parameters are set so 
20 that multiple amplimers are designed in the length range of 150-300bp, with predicted primer melting 
temperatures in the narrow range 60-62°C. The narrow temperature range increases the likelihood 
that a single set of PCR conditions can be used to generate a wide variety of different amplimers. 

If it is desirable to screen a contiguous stretch of DNA which is larger than the maximum 
fragment size desired for sensitive polymorphism detection by fSSCP (300 bp) it is necessary to use 
25 multiple amplimers (which are assayed separately) which span the region of interest. Since the primer 
sites in an amplimer are not tested, these sequences need to be contained within another amplimer. To 
test the primer sequence, overlapping amplimers are designed by an algorithm that evaluates a large 
number of amplimers generated by the primer3 program for the optimum overlapping set according to 
a cost function. Thus, a series of overlapping PCR amplification products can be used to test a 
30 contiguous stretch of DNA. Constraints on primer design are such that the absolute minimum overlap 
is rarely possible. As a result, some regions of overlap occur that results in 'double testing' of a 
particular segment of DNA. The detection efficiency is affected by the sequence context of the 
polymorphism; it is possible that a polymorphic site will be detected in only one of two different 
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amplimers which overlap the same site. One strategy that is useful for increasing polymorphism 
detection efficiency is to design overlapping amplimers to generate 2-fold coverage of all sequences. 

SSCP does not detect 100% of polymorphisms. The invention provides for detection of 
polymorphisms with an efficiency of 95% under a single set of conditions using single coverage of 
5 sequences; a 2-fold screening strategy can be employed if it is necessary to increase this detection 
efficiency. 

It is expected that the polymorphism can be located, and detected anywhere in the SSCP 
fragment except in the regions at each end that correspond to the sequence of the PCR primers. The 
precise location and identity of the sequence variation(s) of a particular SSCP fragment can be 
10 confirmed by sequencing the fragment as described in Section D entitled "Isolation of a Wild Type 
Gene". The sequence of a candidate gene will be compared to the known sequence of a wild-type 
version of the gene by using the following DNA/protein sequence analysis programs and methods. 

There are a large number of freely available methods for performing sequence comparisons. 
These methods differ in their speed of execution, their sensitivity, and the type of comparisons they 
15 are able to make. For example one can compare two DNA sequences, two protein sequences, a 
DNA sequence to a protein sequence by conceptual translation, or DNA sequences as if they were 
protein sequences, again by conceptual translation. The. BLAST suite of programs (Altschul et al., 
1990, J.Mol.Biol. 215:403) are commonly used to perform the above-referenced type of analysis. 
Although the BLAST suite of programs provides a rapid method of determining multiple distinct 
20 similarities between two sequences, these programs are not guaranteed to find an optimal solution 
when comparing two sequences according to a particular set of parameters. PSI-BLAST is a more 
sensitive variant of BLAST that operates by iteratively searching the database while simultaneously 
refining the query pattern based on the results of the searches. Other packages of programs that are 
available and which have different specific properties include the HMMER, SAM, WISE, STADEN 
25 and FASTA packages, and the programs est__genome, dotter, e-PCR, Clustal, crossmatch and phrap 
(Pearson, 1996, Methods EnzvmoL , 266:227). 

If sequence information is available for the intron-exon boundaries and for a region of the 
intron (of approximately 30-150 bp) located immediately 5' of an intron-exon boundary, primers can be 
designed to produce amplimers useful for identifying polymorphisms located in the RNA splice 
30 junctions. Similarly, if the promoter region of a candidate gene has been sequenced, primers can be 
designed to produce amplimers useful for identifying polymorphisms located in the promoter region. 
Additional methods for detecting and isolating polymorphisms include, but are not limited to fluorescent 
polarization-TDI, mass spectroscopy denaturing gradient gel electrophoresis, chemical cleavage of 
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mismatch, constant denaturant capillary electrophoresis, RNase cleavage, hetero duplex analysis, 
sequencing by hybridization, DNA sequencing, representational difference analysis, and denaturing 
high performance liquid chromatography, described below in Section F entitled, "Identification and 
Characterization of Polymorphisms". 

2. Methods of Determining if a Polymorphism Contributes to osteoarthritis 
No two individuals (excluding identical twins or other clones) have the same sequence of 
DNA in their genome. Variability in gene sequences between individuals accounts for many of the 
obvious phenotypic differences (such as pigmentation of hair, skin, etc.) and many nonobvious ones 
(such as drug tolerance and disease susceptibility). In a population, the DNA sequence that occurs at 
the highest frequency at any given site is commonly referred to as the wild type sequence. The term 
"wild type sequence" can be misleading, however, because in different populations an alternative form 
of a DNA sequence may be predominant and thus considered wild type for that particular population. 
DNA polymorphisms are located throughout the genome, within and between genes, and the various 
forms may or may not result in differential gene function (as determined by comparing the function of 
two alternative forms of the same sequence). Most polymorphisms do not alter gene function and are 
called neutral polymorphisms. Some polymorphisms do have an effect on gene function, for example 
by changing the amino acid sequence of a protein, or by altering control sequences such as promoters 
or RNA splicing or degradation signals. 

Polymorphisms can be used in genetic studies to identify a gene involved in a disease. If a 
polymorphism alters a gene function such that it increases disease susceptibility, then it will be present 
more often in individuals with the disease than in those without the disease. Alternatively, if a 
particular DNA variant is protective against a disease, it will be found more often in individuals without 
the disease than in those with the disease. Statistical methods are used to evaluate polymorphism 
frequencies found in diseased as compared to normal populations, and provide a means for establishing 
a causal link between a polymorphism and a phenotype. To detect a significant association between a 
disease and a polymorphic site, different tests maybe used with either genotypic or allelic distributions. 
The simplest test consists of a t-test wherein the frequency of the polymorphic alleles in normal 
individuals and individuals with the disease phenotype is compared. A comparison of the genotypic 
distribution in normal individuals and individuals with the disease phenotype can also be performed 
using a chi- square test of homogeneity. These tests are implemented in all commercially or freely 
available statistical packages, for example SAS and S+, and are even included in Microsoft Excel. 
More sophisticated analyses will be performed by incorporating covariates such as linear regression or 
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logistic regression, and by accounting for the information provided by adjacent polymorphic sites 
(multipoint analysis). An example of this type of program is the freely available program "Analyze" by 
JD Terwilliger (currently available at the WWW site ftp://ftp.well.ox.ac.uk/pub/genetics/analyze). 
If a polymorphism has a phenotypic effect, a bias will exist in the distribution of 
5 polymorphisms between groups that have and do not have the disease phenotype. This manner of 
analysis can be used to study a trait that is not necessarily a disease; any trait can be studied by 
comparing a group with a particular phenotypic form of a trait to a group with a different phenotypic 
form of that trait. It is important that the cases and controls are correctly matched with regards to 
ethnicity, environmental influences, and other factors which could effect the phenotype being studied. 
10 Studies which test polymorphism frequencies within groups exhibiting different phenotypes and use 
statistical methods to compare the group polymorphism frequencies and identify correlations with 
phenotypes, are known as "associations studies". 

Some polymorphisms that occur in a single gene can alter the function of a gene sufficiently 
such that the polymorphism results in a disease (monogenic disease). However, many common human 
15 diseases are polygenic; that is they are the result of complex interactions of various forms of multiple 
genes. In the case of polygenic diseases, the alteration of a single gene may not be detrimental per se, 
but in combination with certain sequence variants of other genes, this altered DNA sequence may 
contribute to a disease phenotype. DNA variants leading to monogenic diseases are usually rare in a 
population due to the process of natural selection against those carrying the disease gene. As variants 
20 in genes that are involved in polygenic disease do not produce the disease phenotype unless they occur 
in the appropriate combination with other gene variants, normal individuals can carry a subset of the 
disease-contributing variants without suffering adverse effects. Thus, disease-contributing gene 
variants that are associated with polygenic diseases may exist at a high frequency in a normal 
population. Selection against these disease variant forms of a gene will only occur when they are 
25 present in the appropriate disease-causing combination and there may not necessarily be selection 

against these gene variants in individuals carrying a subset of the disease-contributing variants. Neutral 
DNA variants do not alter gene function or contribute to a disease, are under no selective pressure 
and occur at variable frequencies within populations. 

Monogenic diseases tend to be rare within the population, and therefore few patients maybe 
30 available for studies of these diseases. A polymorphism in a single specific gene is necessary and 
usually sufficient to cause a monogenic disease, such that associations between the variant gene and 
the phenotype are usually readily apparent. In cases where the expression of a mutation phenotype is 
complete, ("complete penetrance"), the polymorphism present in the disease gene will not be found 
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upon examination of a large number of normal individuals. If there is not complete penetrance then 
some apparently normal individuals will contain the mutation; the difference in frequency of 
occurrence of the variant gene in the disease group as compared to the normal population will reveal 
that the variant is associated with the disease. 

In polygenic diseases, variation at different genes occurs in a combination which alters 
susceptibility to the disease. Although several genes may have variant forms which can contribute to a 
disease phenotype, it is not always necessary for a contributing variant to be present at every gene 
potentially contributing to the disease in a given affected individual. For example, a hypothetical 
disease could be caused by a particular combination of variants at three of four genes, designated as 
A, B, C, and D. Appropriate susceptibility variants in combination at any three of the genes can cause 
the susceptibility, i.e. one person with increased susceptibility may have susceptibility variants in genes 
A, B, and C, while another individual with increased susceptibility to the same disease will have 
susceptibility variants in genes B, C, and D. Therefore, although not all affected individuals will have 
the same susceptibility variants, the net result is that a diseased population will have susceptibility 
variant forms of genes A, B, C, and D at a higher frequency than an unaffected population (as 
detected by association studies). 

Unlike monogenic diseases which result from polymorphisms that are not present in control 
populations, the polymorphisms which contribute to the polygenic disease are also present in a normal 
population. As described in the example above, an individual with susceptibility polymorphisms in only 
one or two of the genes potentially contributing to the disease susceptibility will be normal with regard 
to disease susceptibility. Therefore, normal populations can be used to identify polymorphic regions of 
the genome in the population, and these regions can then be specifically tested in larger patient and 
control populations. Typically, a gene is analyzed for the presence of polymorphisms by testing 
between 2 and 100 normal individuals in order to establish if a particular polymorphism is present for 
that gene in the population. Once a polymorphic site(s) has been defined, the polymorphic site is then 
tested in case (disease) and control (normal) populations and statistical analyses are performed to 
identify polymorphisms which occur at significantly different frequencies in the two populations. 

The determination of the statistical significance of polymorphism frequency differences is 
dependent upon the size of the observed frequency difference between the populations, and on the 
size of the populations being studied. If a significant difference is found, then it can be concluded that 
an association exists between the polymorphism and the phenotype being studied. A statistically 
significant difference is a frequency difference at a particular site between populations which would 
be expected to occur by chance in only 5 out of 100 tests. That is, a difference which has a 95% 
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probability of being a true difference due to the affect of the gene. 

The foregoing discussion describes a method of testing for an association between a 
polymorphism which is the direct contributor to a disease and the disease phenotype. However, 
polymorphisms which do not directly contribute to a disease can also be used to identify regions of the 

5 genome which contain genes that contribute to the disease by virtue of their proximity to disease- 
contributing polymorphisms. 

In humans, DNA exists as 23 homologous pahs of linear molecules (chromosomes). 
Recombination is a process which results in reciprocal exchanges of short homologous DNA 
segments between these homologous DNA pairs. Only one of each of the 23 pairs of chromosomes is 

10 inherited by the offspring. The inherited chromosome is thus made up of tandemly arrayed segments 
of DNA derived from both of a pair of chromosomes. Consequently, DNA is transferred in segments 
from one generation to the next. Although the boundaries of each inherited segment may vary in each 
generation, the net effect is that sequences of DNA which are adjacent along the length of the 
molecule are inherited together at a higher frequency than sequences that are farther apart. If a region 

15 (continuous linear segment) of DNA has two or more polymorphisms that are close together, they will 
be co-inherited at a higher frequency than polymorphisms that are farther apart, as they are more 
likely to remain on the same segment of DNA during recombination. Therefore, if two or more 
polymorphisms are close together, they will occur together at a higher frequency in a population than 
would be expected by random segregation. This effect is known as linkage. Linkage studies are 

20 performed using multiply affected individuals within families; the most commonly used approach is to 
test markers located throughout the genome in many sets of affected sib pairs that share the same 
phenotype. Markers which are located in the region of a genome that contributes to the phenotype will 
be inherited in both siblings, along with the phenotype, at a higher frequency than expected by chance. 
Studies wherein data from many such families is compared can be used to implicate a region of a 

25 genome as one that contributes to a particular phenotype. 

Linkage disequilibrium (LD) association studies provide another method for using 
polymorphisms in genetic studies. The method of LD involves making a correlation at the population 
level, between the alleles (alternative polymorphic forms of the same sequence site) present at 
different genomic sites. If site 1 has two variant forms, A and a, and site 2 has two variant forms B 

30 and b, the observation in a population that allele A at site 1 is more often found with allele B at locus 2 
than with allele b is an example of LD. If allele B is a disease- contributing polymorphism, then testing 
at allele A may show an association with the disease. 

Linkage disequilibrium maybe generated in several ways. Maintenance of LD in a population 
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allows a disease association to be detected many generations after the formation of LD. The 
maintenance of LD is explained by linkage: the closer the two loci, the longer (in terms of number of 
generations) that particular LD is maintained. As a result, polymorphisms which do not directly 
contribute to a disease can be used to identify regions of the genome which contain a disease 

5 contributing polymorphism. If a polymorphism affects gene function such that it contributes to a 
phenotype being studied and is found to be associated with the phenotype, nearby (neutral) 
polymorphisms which are in LD with the disease polymorphism may also show an association with the 
disease. Conversely, if a polymorphism does not affect gene function but is found to be associated 
with a particular phenotype, this polymorphism is in LD with a different, but adjacent polymorphism 

10 that affects gene function such that it contributes to the phenotype being studied. If a neutral 

polymorphism is always inherited with a phenotype- contributing polymorphism, then the strength of 
the association of the neutral polymorphism to the phenotype will be equal to that of the polymorphism 
which affects gene function and is contributing to the phenotype. A polymorphism which shows an 
association with a phenotype (for instance with disease susceptibility) is a marker for that phenotype 

15 and implicates the region in which the polymorphism resides as a region containing a polymorphism 
which contributes to the phenotype. Additional flanking polymorphisms can be tested to determine the 
precise location of the true phenotype-contributing variant. 

Linkage studies on families, and LD studies on populations have different degrees of 
resolution with regards to defining the size of a DNA region which contains the phenotype- 

20 contributing polymorphism. In general, linkage studies define an interval which potentially contains tens 
to hundreds of genes, while LD studies have been used to implicate single genes in the development of 
a particular phenotype. 

3. Test Populations Useful for Polymorphism Genotyping 
25 The invention provides methods of determining allelic frequencies by performing genotypic 

analyses in appropriate test populations. 
Study cohorts: 

Osteoarthritis Progression Cohort 
30 Derived from a population of normal women aged 45-65. The original aims of the study, started 

in 1989, were to assess how many women around menopausal age would get arthritis and what factors 
predispose them to developing it. Also to lookinto factors thatmay be associated with progression of the 
disease. 
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A series of examinations, x-rays and questionnaires about lifestyle factors were carried out on 
1003 women that were recruited to the study. This study has been going for 10 years. As a result, a 
unique, world-renowned and wellrespected study is available looking at the reasons why women develop 
osteoarthritis, potential risk factors and the genetics of the disease. 

Prospective Severe Outcomes Cohort (case-control) 

Five hundred joint replacement cases will be ascertained as will be age, ethnicity and gender 
matched controls. The clinical data envisaged are : HRT use, numbers of joints affected, occupation, 
injury history, age, BML 

The list of studies relevant is shown in following table. 



Huuly type • Population details Reasonable objectives Timing 



Pilot 1 100 progressors + 75 non-progressors, Large genetic effects for : : : 6r8 Months. 

15 ' 100 normals, all female from the fast OA progression, proof 

progression cohort Detailed clinical of principle. Correlation 

data, 10 yi\ follow-up: joint-space with biomarkers. Possible : , 

Biomarker • . 800 women from progression cohort, : . Correlation of genetics with 12 Months: \ : : 
istudy. • i : .: : : : DN A, : s erum, urine , 5 biomarkers ; : ; . : : biomarkets: - v< usefukfor : :;: 

•I Progression -800 women from progression cohort. Genetic effects of OA 18 Months 

20 hand & knee ... Detailed. clinical data y joint-space ' } / ■ progression. Risk of OA. 

OA study narrowing/yr. , joints affected, BMD Correlation with biomarkers. 

(hip and spine), fractures, CRP levels. Possible novel target, 

full lipid measurements, incidence of Genetic effects of 

fractures (assessed by X-rays), 10 yr. osteoporosis risk, correlation 

. ;■: follow-up radiographs for all patients. • with BMD. ; 

: : " : : : r-r- /: '■■'■}■,:■■ Possibly /genetic effects of ." h-: 

Case-control -500 cases (joint replacements) Vs 500 Large genetic effects for ~6~12 

matched controls. Prospective study: OA risk, proof of principle, months for 

; • ; DN A + 2 ibiomarker s . Glinical data! : ' : Possible novel : target. : : : : coUection ;;: : 
reqiiTOd^terptd me, joints affected, = ■ • X . : i . : : 
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jj; \X 7: i - ■; ; occupation^ a^BMI : \ • • : : • : : . . : ['&■ ; 7/:':3 : :/:=: .::■•:• 

25 4. Assays Useful for Determining the Association of a Polymorphism with osteoarthritis 

Clinical parameters 

There is a general consensus that radiological changes are the preferred method for 
epidemiological studies on the basis of cross sectional and prospective correlations between severity of 
30 X-ray changes with the presence of pain and loss of function. In osteoarthritis, the loss of cartilage 
produces a narrowed space between bones. The pattern of joint space narrowing can help distinguish 
between osteoarthritis and rheumatoid arthritis. Bone spurs (osteophytes) also help diagnose 
osteoarthritis. Other relevant clinical end points are pain, disability, function, joint replacement and 
maintenance of joint structure. Stages of disease progression are as follows: 

35 

Early stage: focal swelling of articular cartilage followed by the appearance of irregularities in the 
surface. 

Intermediate stage: progressive degradation and loss of articular cartilage. Also characterised by 
40 fibrillation (vertical splitting), detachment (horizontal splitting) and thinning of the cartilage. 

Late stage: Articular cartilage is almost completely destroyed. Bony outgrowths (osteophytes) occur 
at the joint margins resulting in residual arthritis. Characterised by pain and limitation of joint 
movement. 

45 

Clinical measurements of OA 

Quantitative traits of interest for the study of OA and its progression are: 

50 - Osteophyte count. 

Joint space narrowing (mm/yr.) 
Number of joints affected 
Types of joints affected 4 
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In addition a series of biochemical markers can provide valuable information such as: 

COMP 

- CRP 

- HA 
Protocollagen Type II 

Bone resorption markers (e.g. collagen cross-links) 
Confounding factors 

Most currently recognised environmental risk factors for prevalent knee OA - obesity, knee 
injury, and physical activity, influence incidence more than radiographic progression. Furthermore, 
these factors might selectively influence osteophyte formation more than joint space narrowing. These 
findings are consistent with knee OA being initiated by joint injury, but with progression being a 
consequence of impaired intrinsic repair capacity. 

Other known confounding factors are steroid (glucocorticoid) use and, in women, hormone 
replacement therapy. Glucocorticoids ameliorate erosion in animal OA models and suppress synthesis 
of matrix metalloproteinases (Saito et al. 1999). Estrogen replacement therapy, on the other hand, has 
been shown to have a moderate, but not statistically significant, protective effect against worsening of 
. OA both in the Chingford (Hart et al. 1999) and Framingham (Zhang et al. 1998) studies. 

5. Methods of Genotyping Polymorphisms 

The invention discloses methods for performing polymorphism genotyping. These methods can 
be used to detect the presence of a polymorphism in a sample comprising DNA or RNA. 

A DNA sample for analysis according to the invention may be prepared from any tissue or 
cell line, and preparative procedures are well-known in the art. The preparation of genomic DNA is 
performed as described in Section B. 

RNA samples may also be useful for genotyping according to the invention. Isolation of RNA 
can be performed according to the following methods. 

RNA is purified from mammalian tissue according to the following method. Following removal 
of the tissue of interest, pieces of tissue of <2g are cut and quick frozen in liquid nitrogen, to prevent 
degradation of RNA. Upon the addition of a volume of 20 ml tissue guanidinium solution per 2 g of 
tissue, tissue samples are ground in a tissuemizer with two or three 10-second bursts. To prepare 
tissue guanidiium solution (1 L) 590.8 g guanidinium isothiocyanate is dissolved in approximately 400 
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ml DEPC-treated HJd. 25 ml of 2 M Tris-Cl, pH 7.5 (0.05 M final) and 20 ml Na^DTA (0.01 M 
final) is added, the solution is stirred overnight, the volume is adjusted to 950 ml, and 50 ml 2-ME is 
added. 

Homogenized tissue samples are subjected to centrifugation for 10 min at 12,000 x g at 12°C. 
The resulting supernatant is incubated for 2 min at 65°C in the presence of 0. 1 volume of 20% 
Sarkosyl, layered over 9 ml of a 5.7M CsCl solution (O.lg CsCl/ml), and separated by centrifugation 
overnight at 113,000 x g at 22°C. After careful removal of the supernatant, the tube is inverted and 
drained. The bottom of the tube (containing the RNA pellet) is placed in a 50 ml plastic tube and 
incubated overnight (or longer) at 4°C in the presence of 3 ml tissue resuspension buffer (5 mM 
EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME) to allow complete resuspension of the RNA pellet. The 
resulting RNA solutionis extracted sequentially with 25:24:1 phenol/chloroform/isoamyl alcohol, 
followed by 24:1 chloroform/isoamyl alcohol, precipitated by the addition of 3 M sodium acetate, pH 
5.2, and 2.5 volumes of 100% ethanol, and resuspended in DEPC water (Chirgwin et al., 1979, 
Biochemistry, 18: 5294). 

Alternatively, RNA is isolated from mammalian tissue according to the following single step 
protocol. The tissue of interest is prepared by homogenization in a glass teflon homogenizer in 1 ml 
denaturing solution (4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0.1 M 2-ME, 0.5% 
(w/v) N-laurylsarkosine) per lOOmg tissue. Following .transfer of the homogenate to a 5 -ml 
polypropylene tube, 0.1 ml of 2 M sodium acetate, pH 4, 1 ml water-saturated phenol, and 0.2 ml of 
49:1 chloroform/isoamyl alcohol are added sequentially. The sample is mixed after the addition of 
each component, and incubated for 15 min at 0-4°C after all components have been added. The 
sample is separated by centrifugation for 20 min at 10,000 x g, 4°C, precipitated by the addition of 1 ml 
of 100% isopropanol, incubated for 30 minutes at ~20°C and pelleted by centrifugation for 10 minutes 
at 10,000 x g, 4°C. The resulting RNA pellet is dissolved in 0.3 ml denaturing solution, transferred to a 
microfuge tube, precipitated by the addition of 0.3 ml of 100% isopropanol for 30 minutes at -20°C, 
and centrifuged for 10 minutes at 10,000 x g at 4°C. The RNA pellet is washed in 70% ethanol, dried, 
and resuspended in 100-200 ml DEPC-treated water or DEPC-treated 0.5% SDS (Chomczynski and 
Sacchi, 1987, Anal. Biochem., 162: 156). 

RNA prepared according to either of these methods can be used for genotyping by the 
methods of Northern blot analysis, SI nuclease analysis and primer extension analysis (Ausubel et al., 
supra). 

cDNA samples also maybe prepared according to the invention, i.e., DNA that is 
complementary to RNA such as mRNA. The preparation of cDNA is well-known and well- 
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documented in the prior art. 

cDNA is prepared according to the following method. Total cellular RNA is isolated (as 
described) and passed through a column of oligo(dT)-cellulose to isolate polyA RNA. The bound 
polyA mRNAs are eluted from the column with a low ionic strength buffer. To produce cDNA 

5 molecules, short deoxythymidine oligonucleotides (12-20 nucleotides) are hybridized to the polyA tails 
to be used as primers for reverse transcriptase, an enzyme that uses RNA as a template for DNA 
synthesis. Alternatively, mRNA species can be primed from many positions by using short 
oligonucleotide fragments comprising numerous sequences complementary to the mRNA of interest as 
primers for cDNA synthesis. The resultant RNA-DNA hybrid can be converted to a double stranded 

10 DNA molecule by a variety of enzymatic steps well-known in the art (Watson et al, 1992, 
Recombinant DNA, 2nd edition, Scientific American Books, New York). 

Tissues or fluids which are useful for obtaining a DNA or RNA sample according to the 
invention include but are not limited to plasma, serum, spinal fluid, lymph fluid, external secretions of 
the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and 

15 samples of in vitro cell culture constituents. 

Genotyping methods which are useful according to the invention, i.e. , for the detection of 
polymorphisms in nucleic acid samples isolated from individuals, are disclosed below. , , 

Single Strand Conformation Polymorphism (SSCP) Scree ning and Fluorescent SSCP Screening 
20 (fSSCP) 

SSCP Analysis 

One technique for detecting DNA sequence variations in a biological sample is single strand 
conformation polymorphism (SSCP) (Glavac et aL, 1993, Hum. Mut. 2:404; Sheffield et al., 1993, 

25 Genomics 16:325). SSCP is a simple and effective technique for the detection of single base changes. 
This technique is based on the principle that single-stranded DNA molecules assume specific 
sequence-based secondary structures (conformers) under nondenaturing conditions. The detection of 
point mutations by single stranded conformation polymorphism is believed to be due to an alteration in 
the structure of single stranded DNA. Molecules differing by only a single base substitution may 

30 assume different conformers and migrate differently in a nondenaturing polyacrylamide gel. Single 
stranded DNAs that contain sequence variations are identified by an abnormal mobility on 
polyacrylamide gels. SSCP detects all types of point mutations and short insertions or deletions that 
are located between the PCR primers (within the probe region) with apparently equal efficiency. This 
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technique has proven useful for detection of multiple mutations and polymorphisms, including SNPs. 
SSCP sensitivity varies dramatically with the size of the DNA fragment being analyzed. The optimal 
size fragment for sensitive detection by SSCP is approximately 125-300bp. 

The mobility of a single stranded DNA or double stranded DNA fragment during 
5 electrophoresis through a gel matrix is dependent on its size. Small molecules migrate more rapidly 
than large molecules because they pass througji the pores in the matrix more easily. Conventionally, 
electrophoresis of single stranded DNA involves a 'denaturing' gel which maintains the single 
strandedness of the molecules. The denaturant is typically urea in polyacrylamide gels, and typically 
formamide or sodium hydroxide in agarose gels. In contrast, according to the SSCP screening 

10 protocol, single-stranded DNA is analyzed on a 'nondenaturing' gel. When single stranded DNA is 
analyzed on a 'non-denaturing' gel, intramolecular interactions can occur. In particular, the single 
stranded DNA is able to (partially) bind to itself. Consequently, DNA that is separated by 
electrophoresis on an SSCP gel does not migrate as a linear molecule but rather, the mobility of the 
DNA on an SSCP gel is governed by both its size and tertiary structure (conformation). The tertiary 

15 structure of a single stranded DNA fragment is dependent on the sequence of the entire fragment. 

Therefore, if a polymorphism exists in a given fragment, the conformation will usually be altered. Hie 
technique is performed as follows. 

One or more test DNA samples are prepared for analysis as described above, and subject to 
PCR amplification. Oligonucleotide primers are designed and synthesized as described above. 

20 Amplifications are performed in a total volume of 10 ml containing 50 mM KC1, 10 mM Tris-HCl, pH 
9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non 
radioactive dCTP, 0.05 ml [a- 33 P] dCTP (1,000-3,000 Ci mmol 1 ; 10 mCi ml 1 ), 0.2 uM each primer, 50 
ng genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U Taq DNA polymerase. The PCR 
cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing 

25 temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing 
temperature is different for each PCR primer pair and can be optimized according to the parameters 
described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in 
a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, 
dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a- 33 P] dCTP (1,000-3,000 Ci mmol^lO mCi ml 1 ), 0.2 uM of 

30 each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U of Vent Taq DNA 
polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The 
PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed 
by a final extension at 72°C for 5 min. The length and temperature of each step of a PCR cycle, as 
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well as the number of cycles, is adjusted in accordance to the stringency requirements, as described 
above, 

SSCP analysis is performed as follows. Ten ul of formamide dye (95% formamide, 20mM 
EDTA, 0.05% bromophenolblue, 0.05% xylene cyanol) are added to 10 ul aliquots of radiolabeled 

5 PGR product. Following denaturation at 100°C for 5 min, the reaction mixture is placed on ice. Two ul 
aliquots are loaded onto 8% acrylamide:bisacrylamide (37.5:1), 0.5X TBE (45 mM Tris-borate, 1 mM 
EDTA), 5% glycerol gels. Electrophoresis is carried out at 25W at 4°C for 8 hours in 0.5X TBE. 
Dried gels are exposed to X-OMAT ARfilm (Kodak) and the autoradiographs are analyzed and 
scored for aberrant migration of bands (band shifts). SSCP maybe optimized, as desired, as taught in 

10 Glavac et al., 1993, Hum. Mut 2:404. 

fSSCP Analysis 

Techniques for screening multiple DNA samples simultaneously are also useful for performing 
rapid genotyping analysis on a large number of samples according to the invention. By pooling and 

15 multiplexing DNA samples in fluorescent SSCP (fSSCP) assays, the high throughput required for 
detecting sequence variations in a large number of samples is achieved (Makino et al., 1992, PCR 
Methods Appl. 2:10; Ellison et al., 1993, BioTechniques 15:684). According to the method of fSCCP, 
PCR products are visualized and analyzed using an ABI fluorescent DNA sequencing machine. 
Different primer pairs are identified by different color fluorochromes (4 different fluorochromes are 

20 now available). fSSCP offers the following advantages over SSCP. Unlike SSCP, fSSCP does not 

require handling of radioactive materials. Furthermore, the fSSCP technique allows for automated data 
and automated data analysis programs that detect aberrantly migrating samples. In contrast, SSCP 
evaluation involves visual examination by an individual, and does not provide a means for correcting 
for lane to lane variations in electrophoretic conditions, as does fSSCP analysis. 

25 fSSCP Analysis is performed as follows. 

Amplifications are performed in a total volume of 10 ul containing 50 mM KC1, lOmM Tris- 
HC1, pH 9.0 (at 25 °C), 0. 1 % Triton X-100, 1.5 mM MgCl^, 0.2mM of dGTP, dATP, dTTP, dCTP, 
0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA 
(or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase. The PCR cycling profile is as 
30 follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 
45 sec for 35 cycles and a final extension at 72'C for 5 min. Annealing temperature is different for 
each PCR primer pair. Amplifications using Vent Taq polymerase (New England Biolabs) are 
performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of 
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dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or 
JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA 
polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The 
PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed 
5 by a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair. 
Two nl of fluorescent PCR products are added to 3 ul formamide dye (95% formamide, 20mM 
EDTA, 0.05% bromophenolblue, 0.05% xylene cyanol), denatured at 100°C for 5 min, then placed on 
ice. Thereafter, 0.5-1 ml of Genescan™ 1500 size markers are added as an internal standard. Two ul 
of the mix is loaded onto 8% or 10% acrylamide:bisacrylamide (37.5:1), 0.5X TBE (45 mM Tris- 
10 borate, 1 mM EDTA), 5% glycerol gels and electrophoresis is performed on an ABI 377 DNA 

sequencing machine. Gel temperature is maintained between 4° and 10°C by an external cooling unit 
connected to the internal cooling plumbing and chambers. Electrophoresis is earned out at 2500-3500 
volts for 4 - 10 hours in 0.5X TBE. Data is automatically collected and analyzed with Genescan and 
Genotype analysis software (ABI). 
15 The fSSCP procedure identifies regions of 150-300 base pairs containing a sequence 

variation. To identify the exact sequence change, the fragment which demonstrates the aberrant 
migration is amplified again from the same biological sample, using non fluorescent primers. The 
sequence is then determined using standard DNA sequencing methods well known to those stalled in 
the art (Ausubel et al. , supra). 
20 Although SSCP and fSSCP techniques are preferred according to the invention, other 

methods for detecting sequence variations, including DNA sequencing, can be employed. Additional 
techniques for detecting DNA sequence variations useful according to the invention are described 
below. 

25 Fluorescence Polarization-TDI 

Fluorescence polarization-TDI is another preferred technique technique according to the 
invention for the detection of sequence variations. Template-directed primer extension is a dideoxy 
chain terminating DNA sequencing protocol designed to ascertain the nature of the one base 
immediately 3' to the sequencing primer that is annealed to the target DNA immediately upstream 

30 from the polymorphic site. In the presence of DNA polymerase and the appropriate 

dideoxyribonucleoside triphosphate (ddNTP), the primer is extended specifically by one base as 
dictated by the target DNA sequence at the polymorphic site. By determining which ddNTP is 
incorporated, the alleles present in the target DNA can be determined. 
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Fluorescence polarization is based on the observation that when a fluorescent molecule is 
exited by plane-polarized light, it emits polarized fluorescent light into a fixed plane if the molecules 
remain stationary between excitation and emission. However, because the molecule rotates and 
tumbles in solution, fluorescence polarization is not observed fully by an external detector. The 
fluorescence polarization of a molecule is proportional to the molecule's rotational. relaxation time, 
which is related to the viscosity of the solvent, absolute temperature, molecular volume, and the gas 
constant. If the viscosity and temperature are held constant, then fluorescence polarization is directly 
proportional to the molecular volume, which is directly proportional to the molecular weight. If the 
fluorescent molecule is large (with high molecular weight), it rotates and tumbles more slowly in 
solution and flourescence polarization is preserved. If the molecule is small (with low molecular 
weight), it rotates and tumbles faster and fluorescence polarization is largely lost (depolarized). 

In the FP-TDI assay, the sequencing primer is an unmodified primer will its 3' end 
immediately upstream from a polymorphic or mutation site. "When incubated in the presence of 
ddNTPs labled with different fluorophores, the allele-specific dye ddNTP is incorporated onto the TDI 
primer in the presence of DNA polymerase and target DNA. The genotype of the target DNA 
molecule can be determined simply by exciting the fluorescent dye in the reaction and determining 
whether a change in fluorescence polarization occurs.- Chen et al, 1999, Genome Res., 9:492. 

One or more test DNA samples are prepared for analysis as described above, and subject to 
PCR amplification. Oligonucleotide primers are designed and synthesized as described above. 
Amplifications are performed in a total volume of 10 ml containing 50 mM KC1, 10 mM Tris-HCl, pH 
9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgC^, 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non 
radioactive dCTP, 0.05 ml [a- 33 P] dCTP (1,000-3,000 Ci mmol 1 ; 10 mCi ml 1 ), 0.2 uM each primer, 50 
ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase. The PCR 
cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing 
temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing 
temperature is different for each PCR primer pair and can be optimized according to the parameters 
described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in 
a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, 
dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a- 33 P] dCTP (1,000-3,000 Ci mmor-MO mCi ml" 1 ), 0.2 uM of 
each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U of Vent Taq DNA 
polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The 
PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed 
by a final extension at 72°C for 5 min. The length and temperature of each step of a PCR cycle, as 
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wen as the number of cycles, is adjusted in accordance to the stringency requirements, as described 
above. 

Following PGR amplification, unused PCR primers and dNTPs are destroyed by adding 2ml of 
PGR product to 2ml of SAP/Exonuclease cocktail (0.1U shimp alkaline phosphatase (1 

5 U/ml,Amersham Pharmacia Biotech, Inc., Piscataway, NJ)and 0.2U E. coli exonuclease I (10 U/ml, 
Amersham)in SAP buffer (20mM TrisHCl, pH 8.0; 10 mM MgCl 2 , Amersham))per well of a 384-well 
Black PCR plate (ABT). The mixtures are incubated at 37°C for 60 min before the enzymes are heat 
inactivated at 95°C for 15 min. The mixture is held at 4°C until used in the FP-TDI assay. 

To the enzymatically treated PCR product, 2 ml of TDI reaction cocktail containing TDI 

10 buffer (50mM Tris-HCl (pH 9.0), 50mM KC1, 5 mM NaCl, 2 mM MgCl^, 8% glycerol), 1 mM TDI 
primer, 12.5 nM of each of two allele specific dye-labled ddNTPs (ROX-ddGTP, BFL-ddATP, 
Tamra-ddCTP, or R6G-ddUTP; KEN Life Science Products, Inc., Boston, MA), and 0.32U Thermo 
Sequenase (Amersham). The reaction mixtures are incubated at 94oC for 15 min, followed by 34 
cycles of 94°C for 30 seconds and 55°C for 15 seconds. Upon completion of the reaction cycles, the 

15 samples are held at 4°C. 

After the primer extension reaction, 24 ml of TE buffer/methanol (2:1) is added to each 
sample well, and the fluorescence polarization is measured using a LJL Analyst (LJL Biosystems, 
Sunnyvale, CA). 

20 Denaturing Gradient Gel Electrophoresis 

Denaturing gradient gel electrophoresis (DGGE) is a gel system which allows electrophoretic 
separation of DNA fragments differing in sequence by a single base pair. Hie separation is based 
upon differences in the temperature of strand dissociation of the wild-type and mutant molecules. 
During electrophoresis, fragments migrating through the gel are exposed to an increasing 

25 concentration of denaturant'in the gel. When the DNA fragments are exposed to a critical level of 

denaturant, the DNA strands begin to dissociate. This dissociation causes a significant reduction in the 
mobility of the fragment. The position in the gel at which the level of denaturant is critical for a 
particular DNA fragment is a function of the Tm of the DNA fragment and is therefore different for 
wild-type versus mutant fragments. Consequently, upon migration to the position at which the level of 

30 denaturant is at the critical point, for either the wild-type or the mutant fragment, the mobility of these 
two molecules will become different, thus resulting in their separation. The mutation detection rate of 
DGGE approaches 100%. Although the technique of DGGE is relatively simple to perform, and does 
not require radioisotopes or toxic chemicals, it does require some specialized equipment. Furthermore, 
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DGGE can only be used to analyze fragments between 100 and 800bp due to the resolution limit of 
polyacrylamide gels. DGGE is advantageous over other methods useful for detecting sequence 
variations because the behavior of DNA molecules on DGGE gels can be modeled by computer 
thereby making it possible to accurately predict the detectability of a mutation in a given fragment. 
Genomic DNA fragments can be efficiently transferred from the gel following DGGE as described in 
US Patent No. 5,190,856. 

Chemical Cleavage of Mismatches 

Chemical cleavage of mismatch (CCM) is another technique for detection of sequence 
variations that is useful according to the invention. CCM is based upon the ability of hydroxylamine 
and osmium tetroxide to react with the mismatch in a DNA heteroduplex and the ability of piperidine 
to cleave the heteroduplex at the point of mismatch. According to the method of CCM, sequence 
variations are detected by the appearance of fragments that are smaller than the untreated 
heteroduplex following denaturing polyacrylamide gel electrophoresis. 

DNA fragments up to lkb in size can be analyzed by CCM with a probable 100% detection 
rate for sequence variation. CCM is particularly useful for either detecting all of the sequence 
variations in a particular fragment of DNA or for determining that there are no sequence variations in 
a particular fragment of DNA. 

Constant Denaturant Capillary Electrophoresis (CDCE) Analysis 

CDCE analysis is particularly useful in high throughput screening, i.e., wherein large numbers 
of DNA samples are analyzed. CDCE analysis combines several elements of both replaceable linear 
polyacrylamide capillary electrophoresis and constant denaturant gel electrophoresis. The technique of 
CDCE is a rapid, high resolution procedure that demonstrates a high dynamic range, and is 
automatable. The method of CDCE, as described in detail in Khrapko et al., 1994, Nucleic Acids Res. 
22:364, involves the use of a zone of constant temperature and a denaturant concentration in capillary 
electrophoresis. Linear polyacrylamide gel electrophoresis is performed at viscosity levels that permit 
facile replacement of the matrix after each run. For a typical 100 bp fragment of DNA, point 
mutation-containing heteroduplexes are separated from wild type homoduplexes in less than 30 
minutes. Using laser- induced fluorescence to detect fluorescent-tagged DNA, the system has an 
absolute limit of detection of 3 x 10 4 molecules with a linear dynamic range of six orders of magnitude. 
The relative limit of detection is about 3/10,000, i.e., 100,000 mutant sequences are recognized among 
3 x 10 8 wild type sequences. This approach is applicable to analysis of low frequency mutations, and 
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to genetic screening of pooled samples for detection of rare variants. 
Rnase Cleavage 

An additional method for genotyping that is useful according to the invention is RNase 

5 Cleavage. Various ribonuclease enzymes, including RNase A, RNase Tl and RNase T2 specifically 
digest single stranded RNA. When RNA is annealed to form double stranded RNA or an RNA/DNA 
duplex, it can no longer be digested with these enzymes. However, when a mismatch is present in the 
double stranded molecule, cleavage at the point of mismatch may occur. 

RNase Cleavage is preferably performed with RNase A. Ribonuclease A specifically digests 

10 single stranded RNA but can also cleave heteroduplex molecules at the point of mismatch. The extent 
of cleavage at single base mismatches depends on both the type of mismatch, and the sequence of 
DNA flanking the mismatch. Sequence variations leading to mismatch are indicated by the presence 
of fragments that are smaller than the uncleaved heteroduplex on denaturing polyacrylamide gels. 
According to the invention, RNase Cleavage involves forming a heteroduplex between a 

15 radiolabeled single stranded RNA probe (riboprobe) and a PCR product derived from a biological 
sample. If a point mutation is present in the PCR product, following treatment of the resulting 
RNA/DNA heteroduplex with RNase A, the RNA strand of the duplex maybe cleaved. The sample 
is then denatured by heating and analyzed on a denaturing polyacrylamide gel. If the RNA probe has 
not been cleaved, it will be the same size as the PCR product. If the probe has been cleaved, it will be 

20 smaller than the PCR product. RNase Cleavage can be used to easily detect a 1 bp deletion. 

However, small insertions may not be as easily detected as small deletions, by RNASE Cleavage, as 
'looping-out' occurs on the target strand rather than the probe strand. 

Heteroduplex Analysis 

25 Another method for genotyping according to the invention is heteroduplex analysis. 

Heteroduplex molecules, i.e., double stranded DNA molecules containing a mismatch, can be 
separated from homoduplex molecules on ordinary gels. The exact rate of detection of sequence 
variations by heteroduplex analysis is unknown, but is clearly significantly lower than 100%. 
Presumably, the sequence of DNA flanking the mismatch, rather than the actual mismatch affects the 

30 detectability. Mismatches that are located in the middle of a DNA fragment are detected most easily. 
Although heteroduplex analysis is less sensitive than some of the other genotyping methods described, 
it maybe considered useful according to the invention due to its simplicity. 
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Mismatch Repair Detection ( MKCQ 

Another technique that is useful for genotyping according to the invention is mismatch repair 
detection (MRD). MRD is an in vivo method that detects DNA sequence variation by the occurrence 
of a change in bacterial colony color. DNA fragments to be screened for variation are cloned into two 
5 MRD plasmids, and bacteria are transformed with hetero duplexes of these constructs. The resulting 
colonies are blue in the absence of a mismatch and white in the presence of a mismatch. MRD can be 
used to detect a single mismatch in a DNA fragment as large as 10 kb in size. MRD permits high- 
throughput screening of genetic mutations, and is described in detail in Faham et aL, 1995 , Genome 
• Research 5:474. 

10 

Mismatch Recognition by DNA Repair Enzymes 

Another technique that is useful for detecting sequence variations according to the invention is 
Mismatch Recognition by DNA Repair Enzymes. The E.coli mismatch correction systems are well- 
understood. Three of the proteins required for the methyl-directed DNA repair pathway: MutS, MutL 

15 and MutH are sufficient to recognize 7 of the possible 8 single base-pair mismatches (C/C mismatches 
are not recognized) and cut/nick the DNA at the nearest GATC sequence. The MutY protein, which 
is involved in a distinct repair system can also be used to detect A/G and A/C mismatches. Some 
mammalian enzymes are also useful for mismatch recognition: thymidine glycosylase can recognize all 
types of T mismatch and 'all-type endonuclease' or Topoisomerase I is capable of detecting all 8 

20 mismatches, but does so with varying efficiencies, depending on both the type of mismatch and the 
neighboring sequence. 

The MutS gene product is the methyl-directed repair protein which binds to the mismatch. 
Purified MutS protein has been used to detect mutations by several different methods. Gel mobility 
assays can be performed in which DNA bound to the MutS protein migrates more slowly through an 
25 acrylamide gel than free DNA. This method has been used to detect single base mismatches. 

An alternative method for the use of MutS in mismatch recognition, which does not require gel 
electrophoresis, involves the immobilization of MutS protein on nitrocellulose membranes. Labeled 
heteroduplexed DNA is used to probe the membrane in a dot-blot format. When both DNA strands 
are used, all mismatches can be recognized by binding of the DNA to the protein attached to the 
30 membrane. Although C/C mismatches are not detected, the corresponding G/G mismatch derived 
from the other strand is recognized. This technique is particularly useful because it is simple, 
inexpensive, and amenable to automation. However, the detection efficiency of this method may be 
limited by the size of the DNA fragment. In particular, this method works well for very short 
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fragments. 

Sequencing by Hybridization ( SBH) 

An alternative method for detecting sequence variations according to the invention is 
5 sequencing by hybridization (SBH). According to this method, arrays of short (8-10 base long) 

oligonucleotides are immobilized on a solid support in a manner similar to the reverse dot-blot protocol, 
and probed with a target DNA fragment. In particular, oligonucleotides are synthesized together and 
directly onto the support. 

The synthesis system begins with a silicon chip coated with a nucleotide linked to a light- 
10 sensitive chemical group which is used to illuminate particular grid co-ordinates removing the blocking 
group at these positions. The chip is then exposed to the next photoprotected nucleotide, which 
polymerizes onto the exposed nucleotides. 

In this manner, as a result of successive rounds of nucleotide additions, oligonucleotides of 
different sequences can be synthesized at different positions on the solid support. Thirty-two cycles of 
15 specific additions (i.e. , 8 additions of each of the four nucleotides) should enable the production of all 
65,536 possible 8-mer oligonucleotides, at defined positions on the chip. 

When the chip is probed with a DNA molecule, e.g.,. a fiuorescently labeled PGR product, 
fully matched hybrids should give a high intensity of fluorescence and hybrids with one or more 
mismatches should give substantially less intense fluorescence. The combination of the position and 
20 intensity of the signals on the chip enables computers to derive the sequence of the DNA molecule 
being analyzed for the presence of sequence variations. 

Allele-Specific Oligonucleotide Hybridization 

The technique of allele-specific oligpnucleotide (ASO) hybridization or the 'dot-blot' is also 

25 useful for genotyping according to the invention. Under specific hybridization conditions, an 

oligonucleotide will only bind to a PCR product if the two are 100% identical. A single base pair 
mismatch is sufficient to prevent hybridization. A pair of oligonucleotides, one carrying the wild type 
base and Hie other carrying a single base change, as compared to the wild type sequence, can be used 
to determine if a PCR product is homozygous wild type, heterozygous or homozygous mutant for a 

30 particular base change. When performing conventional dot blots, the PCR product is fixed onto a nylon 
membrane and probed with a labeled oligonucleotide. When performing a 'reverse dot blot', an 
oligonucleotide is fixed to a membrane and probed with a labeled PCR product. The probe may be 
isotopically labeled, or non~isotopically labeled. The technique allows for the genotyping of multiple 
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PGR amplified samples for the presence of a single base change. 
Allele-Specific PGR 

Many methods for identifying sequence variations involve the analysis of PCR-amplified 
5 DNA. The allele-specific polymerase chain reaction (also called the amplification refractory mutation 
system or ARMS) comprises an assay that occurs during the PCR reaction itself. ARMS requires the 
use of sequence-specific PCR primers which differ from each other at their terminal 3 ' nucleotide and 
are designed to amplify only the normal allele in one reaction, and only the mutant allele in another 
reaction. When the 3' end of a specific primer is 100% identical to the target, amplification occurs. 
10 When the V end of a specific primer is not 100% identical to the target, amplification does not occur. 
Agarose gel electrophoresis is used to detect the presence of an amplified product. Hie genotype of a 
(heterozygous) wild-type sample is characterized hy amplification products in both reactions, and a 
homozygous mutant sample generates product in only the mutant reaction. 

This technique can be modified so that the 5 ? ends of the allele-specific primers are labeled 
15 with different fluorescent labels, and the 5' end of the common primers are biotin labeled. According 
to this alternate protocol, the wild-type specific and the mutant-specific reactions are performed in. a 
single tube. The advantages of this approach are that a gel electrophoresis step is not required, and the 
method is amenable to automation. 

20 Primer-In troduced Restriction Analysis 

The method of primer-introduced restriction analysis (PIRA) can also be used for genotyping 
according to the invention. PIRA is a technique which allows known sequence variations to be 
detected by restriction digestion. By introducing a base change close to the position of a known 
sequence variation (for example by using a PCR primer containing a mismatch, as compared to the 

25 target sequence), it is possible to create a restriction endonuclease recognition site that indicates the 
presence of a particular sequence change. The combination of the altered base in the primer sequence 
and the altered base at the mutation site, creates a new restriction enzyme target site. This approach 
maybe used to create a new restriction enzyme site in either the wild-type allele or the mutant allele. 
If a novel restriction enzyme site is introduced in the mutant allele then, following digestion with the 

30 appropriate restriction enzyme, the homozygous wild-type form would produce a single band of the 
full-length size, the homozygous mutant form would produce a single band of the reduced size and the 
heterozygous form would produce both full length and reduced sized bands. Band size will be analyzed 
by gel electrophoresis. 
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Oligonucleotide Ligation Assay 

The technique of oligonucleotide ligation can also be used for genotyping according to the 
invention. 

The method of oligonucleotide ligation is based on the following observations. If two 
5 oligonucleotides are annealed to a strand of DNA and are exactly juxtaposed, they can be joined by 
the enzyme DNA ligase. If there is a single base pair mismatch at the junction of the two 
oligonucleotides then ligation will not occur. According to the method of oligonucleotide ligation, the 
two oligonucleotides used in the assay are modified by the addition of two different labels. According 
to this method, the assay for a ligated product involves detecting a ligated product by assaying for the 
10 appearance of the labels of the two oligonucleotides on a single molecule rather than visualization of a 
new, larger sized DNA fragment by gel electrophoresis. 

When ligation reactions are conducted in 9 6- well microtiter plates and ligation is scored by 
ELIS A, the oligonucleotide ligation assay can be performed by a robot and the results can be analyzed 
by a plate reader and fed directly into a computer. This method is therefore extremely useful for 
15 detecting the presence of a sequence variation in a large number of samples. The oligonucleotide 

ligation assay is performed on PCR-amplified DNA. A modification of this assay, termed the ligase . 
chain reaction, is performed on genomic DNA and involves amplification with a thermostable DNA . 
ligase. 

20 Direct DNA Sequencing 

Genotyping according to the invention may also be carried out by directly sequencing the 
DNA sample in the region of the gene of interest, using DNA sequencing procedures well-known in 
the art (described above in Section D, entitled "Isolation of a Wild Type Gene"). 

25 Mini-Seqnencing 

The technique of mini-sequencing (also known as single nucleotide primer extension) can also 
be used to detect any known point mutation, deletion or insertion, according to the invention. Obtaining 
sequence information for just a single base pair only requires the sequencing of that particular base. 
This can be done by including only one base in the sequencing reaction rather than all four. When this 
30 base is labeled and complementary to the first base immediately 3 7 to the primer (on the target strand), 
the label will not be incorporated. Thus, a given base pair can be sequenced on the basis of label 
incorporation or failure of incorporation without the need for electrophoretic size separation. 
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5* Nuclease Assay 

Genotyping according to the invention can also be performed by the method of 5' nuclease 
assay. The 5' nuclease assay is a technique that monitors the extent of amplification in a PCR 
reaction on the basis of the degree of fluorescence in the reaction mix. A low level of fluorescence 

5 indicates no amplification or very poor amplification and a high level of fluorescence indicates good 
amplification. This system can be adapted to permit identification of known sequence variations, 
without the need for any post-PCR analysis other than fluorescence emission analysis. 

PCR amplification is detected by measuring the 5' to 3 ' exonuclease activity of Taq 
polymerase. Taq polymerase cleaves 5' terminal nucleotides of double stranded DNA. The preferred 

10 substrate for Taq polymerase is a partially double stranded molecule. Taq polymerase cleaves the 
strand that contains the closest free 5 7 end. According to the 5' nuclease assay, an oligonucleotide 
'probe' which is phosphorylated at its 3 ? end so as to render it incapable of serving as a DNA 
synthesis primer, is included in the PCR reaction. The probe is designed to anneal to a position 
between the two amplification primers. When an actively extending Taq polymerase molecule reaches 

15 the probe molecule, it partially displaces the probe and then cleaves the probe at or near the single 
stranded/double stranded cleavage site until the entire probe is broken up and removed from the 
template. The polymerase continues this process of displacement and cleavage until the entire probe is 
broken up and removed from the template. The probe is labeled in a manner that permits detection of 
the removal of the probe. In particular, the probe is labeled at different positions with two different 

20 fluorescent labels. One label has a localized quenching effect on the fluorescence of the other 

(reporter) label. This effect is mediated by energy transfer from one dye to the other, and requires that 
the two dyes are in close proximity to each other. If the probe is cleaved at a position between the 
reporter and the quencher dyes, the two dyes become physically separated thereby resulting in an 
increase in fluorescence which is proportional to the yield of the PCR product. 

25 

Representational Difference Analysis (RDA) 

Genotyping according to the invention can also be carried out by Representational Difference 
Analysis (RDA). RDA is described in detail in Lisitsyn et al, 1993, Science 259:946, and an 
adaptation which combines selective breeding with RDA is described in Lisitsyn et aL, 1993 , Nature 
30 Genet. 6:57. RDA identifies sequence dissimilarities through the application of a powerful approach to 
subtractive hybridization. According to the method of RDA, one first creates simplified 
representations, called amplicons, from two samples that are being compared. An amplicon can 
comprise, for example, the set of BglH fragments that are small enough to be amplified by the PCR. 
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The iterative subtraction step begins with the ligation of a special adaptor to the 5' end of fragments 
contained in the amplicon derived from the test sample (tester amplicon). The tester amplicon is then 
melted and briefly reannealed in the presence of a large excess of amplicon, derived from the wild 
type sample (driver amplicon). Those tester fragments that reanneal (presumably fragments absent 

5 from the wild type, driver amplicon) can serve as a template for the addition of the adaptor sequence 
to the 3 7 -end of the "partner" fragment. As a result, these tester fragments can be exponentially 
amplified by PGR. This procedure is then repeated to achieve successively higher enrichment. 

RDA may be used to clone sequences that are either wholly absent from the wild type sample 
or are present in the wild type DNA, but are contained in a restriction fragment that is too large to be 

10 amplified in the amplicon. The former case may arise from a total deletion; the latter from a restriction 
fragment length polymorphism with the short allele present in the tester but not the wild type DNA. 
RDA is useful for subtracting DNA from an individual with a particular disease from normal DNA so 
as to identify regions showing homozygous or heterozygous deletions; locating fragments present in a 
parent with a dominant disorder but absent in his unaffected offspring; and locating mRNAs expressed 

15 in normal tissue but not present in tissue isolated from an individual with a particular disease. 

Denaturing High Performance Liquid Chromatography , 

According to the scanning method of Denaturing High Performance Liquid Chromatography 
(DHPLC), partial heat denaturation and a linear acetonitrile column are used to identify 
20 polymorphisms in DNA fragments. DHPLC provides a method of comparative DNA sequencing 
based on the capability of ion-pair reverse phase liquid chromatography on alkylated nonporous 
poly(styrene divinylbenzene) particles to resolve homo- fromheteroduplex molecules under conditions 
of partial denaturation. This method can potentially be automated to allow for rapid analysis of a large 
number of samples (Underhill et al, 1996, Proc. Natl. Acad. Sci. USA, 93:196). 

25 

Mass Spectroscopy 

Matrix-assisted laser desorption-ionization-time-of-flight (MALDI-TOF) mass spectroscopy is 
another method according to the invention by which genotyping can be performed. The method of 
MALDI-TOF mass spectroscopy is based on the irradiation of crystals formed by suitable small 
30 organic molecules (referred to as the matrix) with a short laser pulse at a wavelenght close to the 
resonant adsorption band of the matrix molecules. This causes an energy transfer and desorption 
process producing matrix ions. Low concentrations of nucleic acid molecules are added to the matrix 
molecules while in solution and become embedded in the solid matrix crystals upon drying of the 
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mixture. The intact nucleic acids are then desorbed into the gas phase and ionized upon irradiation with 
a laser allowing their mass analysis. MALDI is used primarily with time-of-flight spectrometers 
where the time of flight is related to the mass-to-charge ratio of the nucleic acids molecules. 
Reviewed in Griffin T J. and Smith L.M., 2000, Trends Biotech 18:77. 

5 Genotyping can be performed by any of the following MALDI-TOF mass spectroscopy 

approaches including sequencing of PGR products (Fu, D-J et al, 1998, Nat. Biotechnol. 16:381; 
Kirpekar, R et al., Nucleic Acids Res. 26:2554), direct mass-analysis of PCR products (Ross, P.L. et 
al, 1998, Anal Chem. 70:2067), analysis of allele-specific PCR (Taranehko, NX et al, 1996, Genet. 
Anal. Biomol. Eng. 13:87) or LCR (ligase chain reaction; Jurihke, C. et al., 1996, Anal. Biochem. 

10 237:174) products, analysis of RFLP-PCR products (Srinivasan, J.R. et al, 1998, Rapid Commun. 
Mass Spectrom. 12:1045), minisequencing (Haff, L.A. and Smirnov, LP., 1997, Genome Res. 7:378; 
Higgens, G.S. et al, 1997, BioTechniques 23:710), analysis of PNA (peptide nucleic acid) hybridization 
probes (Griffin, TJ. et al, 1997, Nat. Biotech. 15:1368; Ross, P.L., Anal. Chem. 69:4197; Jiang- 
Baucom, P. et al, 1997, Anal Chem. 69:4894), or direct analysis of invasive cleavage products 

15 (Griffin, T.J. et al, 1999, Proc. Natl Acad. Sci. USA 96:6301). 

6. Methods of Specifying a Polymorphism 

The invention provides methods for specifying a particular polymorphism. By "specifying an 
polymorphism" is meant defining a polymorphism in the context of a larger region of nucleic acid ' 
20 which contains the polymorphism, and is of sufficient length to be easily differentiated from any other 
position in the genome. 

A unique nucleotide position (e.g. a polymorphic site) in the human genome can be specified 
by describing a unique sequence of DNA within the genome, and providing the location of the unique 
nucleotide position relative to that sequence. Preferably this is done by providing the sequence identity 
25 of a length of unique DNA containing the polymorphism, and indicating which of the nucleotide sites is 
polymorphic. 

A calculation can be made to determine a sequence length which will be unique in the 3 billion 
nucleotide human genome. If it is assumed that the genome contains equal numbers of the nucleotides 
A, G, C and T, and that they occur randomly in the genome, one can determine the probability of any 
30 given sequence of a defined length occurring in the genome; a random 12mer will appear in a random 
3,000,000,000 bp genome 179 times, a random 15 mer will appear in a random 3,000,000,000 bp 
genome 3 times and a random 16mer will appear in a random 3,000,000,000 bp genome 1 time. 

Thus, it would appear that specifying 16 bp would uniquely define a sequence in the genome. 
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However, the genome is not composed of random sequence and does not contain equal amounts of A, 
G, C and T. In fact, 10-12 bp sequences are likely to be specific for 95% of genes. Some sequences 
may even be specified by as few as 8 nucleotides. The minimum sequence length that is useful 
according to the invention for identifying polymorphisms in most gene and intergenic sequences is 

approximately 9-15 bp. 

In the case of repeat sequences and sequences associated with gene families, the probability 
of observing a particular sequence is greatly increased and it becomes difficult to specify a 
polymorphism in the context of a sequence that is only on the order of 9-15 bp. There are many types 
of repeats including tandem repeats, where a larger sequence block has within it smaller repeat units 
(e.g. microsatellites). Tandem repeats usually occur within non-genic areas, but can also occur within 
genes and subsequently affect gene function; they can be 10-1000s of bp long, or, if located in 
centromeres and telomeres, be megabase sized. Some repeats are composed of blocks which do not 
have sub-repeat units and are non-functional (e.g. -300 bp Alu repeats). These occur by 
duplication/dispersal throughout the genome. 

It may be difficult to specify a polymorphism that occurs in a gene that is a member of a gene 
family. Through the mechanism of gene duplication, gene families, comprising multiple copies of a 
gene in which some, but not all of the DNA sequence has diverged, have been formed. Thus, certain 
regions of a gene may be conserved in different gene family members; With time, a duplicated gene 
can lose function and the sequence of the duplicated gene can deteriorate; the amount of homology 
between the original gene and the duplicated version depends upon the time since duplication. Other 
duplications maintain function and retain some level of similarity with the original gene in the important 
domains. Some related genes can share nearly 100% homology across a region that is hundreds of bp 
long, and yet have no significant homology at any other location. In these cases, it may be necessary 
to specify dozens or more nucleotides to provide a unique sequence. 

To identify a unique sequence, a search must be done wherein a specific sequence is 
compared to all known human sequences and the minimum unique sequence is defined. However, in 
the absence of a complete sequence for die human genome, it cannot be guaranteed that a sequence 
is truly unique. Empirical experimentation can be used to determine the minimum sequence for 
specificity/uniqueness. In the case of a gene family member, if sequence information is available for 
the region corresponding to the region of interest in other members of the gene family, than it may be 
possible to define a unique short (9-15 bp) sequence that contains a polymorphism and has specificity. 
In the event that a particular region cannot be defined as unique, a larger region of nucleic acid which 
contains the polymorphism will be required to define a polymorphism in a gene that is a member of a 
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gene family. It is predicted that a sequence of 9-15 bp will be sufficient to define a polymorphism in 
99% of all cases. 

Methods of specifying a polymorphism that involve using sequences which either encompass 
or overlap the polymorphic site to be tested or do not encompass or overlap the polymorphic site to be 
tested are useful according to the invention and are described below. 

Oligonucleotide Hybridization. 

An oligonucleotide is designed such that it is specific for a target sequence, and hybridizes 
only at the target sequence site. This oligonucleotide will not hybridize if the target sequence differs at 
the position in the sequence to be tested. Another oligonucleotide is designed such that it hybridizes 
with the polymorphic form of the sequence. A DNA sample is tested for hybridization with each of 
the two probes independently. If the DNA hybridizes to only one of the probes, it can be concluded 
that the individual is homozygous for the corresponding sequence. If both probes hybridize to a test 
DNA sample, then the individual is heterozygous. Hybridization will be detected by the method of 
Southern blot analysis (as described in Section C entitled "Production of a Nucleic Acid Probe 7 ')- 

Specifying a Polymorphism by PCR 

An alternative method for specifying a particular polymorphism involves a PCR-based 
strategy. According to this method, a region of a candidate gene to be tested is amplified by PCR (as 
described). The amplified fragment is digested with a restriction enzyme that will not cut a fragment 
that contains a polymorphism, due to the location of the polymorphism within the recognition site of this 
restriction enzyme. The products of the digestion reaction mixture are size separated in an agarose gel, 
stained with ethidium bromide, and visualized under ultraviolet light to determine if the amplified 
product has been digested. According to this method, the PCR primers provide the specificity for a 
particular polymorphism by virtue of the specific sequence of the two primers, as well as by the 
location of the primer binding sites in the target DNA. Although, multiple sites for primer binding may 
exist in a target DNA sequence, only the sites that are close enough together will produce an amplified 
product that includes the nucleic acid region containing the polymorphism. 

Alternatively, a PCR reaction is carried out with PCR primers that contain polymorphisms. 
According to this embodiment, if the template nucleic acid lacks the polymorphism present in the 
primers there will be no PCR product. Thus, according to this embodiment of the invention, the 
absence of a PCR product indicates that a polymorphism is not present in the target sequence. 
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Primer Extension 

A DNA fragment comprising the region containing a polymorphism is PCR amplified from an 
individual to be tested. The PCR product is denatured and one strand is retained for analysis. An 
oligonucleotide probe is designed such that it is specific for a region in the sequence and hybridizes 
such that its V terminal nucleotide is paired with the nucleotide adjacent to the one to be tested. The 
PCR product and probe are combined with a polymerase and terminating, differentially colored, 
nucleotides. The polymerase extends Hie probe by one base, and only the base which is 
complementary to the site being tested is added. The reaction is washed, and the color of the reaction 
indicates the nucleotide that has been added and the sequence at the position of interest. 

The PCR step provides one level of specificity by amplifying a region (1 - 10000 bp as desired 
between the PCR primers) from a complex (3,000,000,000 bp) mixture. The PCR probes primers must 
be unique in both their hybridization specificity and their proximity to one another. Since proximity of 
the two PCR primers is needed (i.e. a distance across which a polymerase can extend to join the 
primers), shorter PCR primers can be used, e.g. in theory a small enough region could be amplified 
with a 8-10 bp binding site for a PCR primer. To ensure that a primer hybridizes with specificity, a 
primer must be at least 5 bp. 

A second level of specificity is provided by the primer which is extended in the primer 
extension reaction. Since this primer is hybridizing to a short piece of DNA, it can be short and unique 
for the fragment with which it binds. The primer is at least 5bp and preferably 8bp. Although the 
primer used for the primer extension step is located probe adjacent to the polymorphic site, the PCR 
primers should not overlap with the polymorphic site being tested. 

Southern Blotting 

One method for detecting a previously defined polymorphism involves Southern blot analysis 
of wild type and mutant DNA following digestion with a restriction enzyme which has a recognition 
sequence which includes the polymorphic site to be tested. According to this method, a particular 
restriction enzyme cuts wild type DNA but does not cut mutant DNA due to the presence of a 
polymorphism within the recognition site of this restriction enzyme. Many restriction enzymes exist 
which recognize 4bps. The resulting fragments will be size separated in an agarose gel, transferred to 
a membrane and probed with a nucleic acid probe. If the site is uncut, the fragment is one length and 
if the site is cut the fragment will be of a shorter length. 

The nucleic acid hybridization probe will provide specificity to the particular polymorphism 
being tested by defining the polymorphism in the context of a larger stretch of nucleic acid sequence. 
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The nucleic acid probe may comprise the nucleic acid sequence corresponding to the region known to 
contain the polymorphism. The sequence-specific probe may be located 10, 100, 1000, or even 100s of 
thousands of bases from the region containing the polymorphism. If the probe is located some distance 
from the region containing the polymorphism, an intervening recognition site for the restriction enzyme 
5 cannot be located between the probe hybridization site and the region of interest containing the 

polymorphism site. Typically, a hybridization probe useful according to this method will be much larger 
than the minimum length of a sequence (9-15 bp) required to give specificity to, or define a particular 
polymorphism. 

Alternatively, a chemical or enzyme which recognizes a unique pair of nucleotides at the site 
10 of a polymorphism, can be used to detect the polymorphism. According to this method, the amount of 
sequence required for recognition by a chemical or enzyme is 2 bp (providing that the 2 bp sequence is 
unique in a region large enough to produce a fragment which can then be bound by a specific probe). 

According to a variation of the above method, a labeled chemical or enzyme which binds to 
one sequence of the polymorphic recognition site and not another is used. This method involves the 
15 steps of digesting the DNA with a restriction enzyme, and adding a labeled, sequence-specific binding 
protein (e.g. a restriction enzyme that lacks cleavage capability). The sequence-specific binding 
protein will bind to multiple sites in the genome, including the site to be tested. The fragments will be 
separated on a gel and then probed with a probe specific for the test sequence. If the fragment 
identified by the second probe is identical to a fragment identified by the first probe (e.g. the labeled 
20 chemical or enzyme), then the sequence being tested for is present. 

7. Determination of the Phenotypic Outcome of a Polymorphism 

To determine the phenotypic outcome of a polymorphism according to the invention, it is 
25 necessary to screen suitable populations to obtain a statistically significant measure of the association 
of a polymorphism with a particular disease (e.g osteoarthritis). The invention provides methods for 
performing polymorphism genotyping in appropriate populations (described above). The invention also 
provides in vitro and in vivo assays useful for detennining the phenotypic outcome of a polymorphism 
in a candidate gene. 

30 Every polymorphism has the potential to alter the genetic activity of an individual. At the level 

of a single gene, the effect of a polymorphism can range from an inconsequential, silent change to a 
change that causes a complete loss of protein function to a gain of aberrant or detrimental function 
mutation. The severity of the effect of a polymorphism on gene activity will depend on the exact 
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molecular consequences of the particular polymorphism. For example, alterations of a single pre- 
mRNA splicing dinucleotide could have profound effects on both the quantitative and qualitative 
properties of gene activity since alterations in splicing efficiency can both reduce the overall level of 
normal transcription as well as cause "exon skipping". If the deleted exon involves a coding exon then 
5 exon skipping will lead to an alteration in the amino acid composition of the resulting protein and likely 
effect protein activity. To accurately asses the role of a particular polymorphism in the regulation of 
various molecular events, appropriate assays for both gene expression and protein function must be 
carried out. 

In vitro assays useful for determining the effects of a polymorphism on gene expression and 
10 protein function include, but are not limited to the following, 
i. Transcriptional Regulation 

The transcriptional regulation of a candidate gene containing a polymorphism may be altered, 
as compared to the wild type gene. 

15 Promoter Activity 

If a polymorphism is located in the promoter, enhancer or repressor region of a candidate 
. gene, promoter assays (well known in the art) wherein the altered promoter of the candidate gene is 
used to drive the expression of a reporter gene (e.g. CAT, luciferase, GFP) are performed. Changes 
in the transcriptional regulation of a candidate gene due to the presence of a polymorphism can also be 
20 detected by methods useful for measuring the level of mRNA including S 1 nuclease mapping and RT- 
PCR. 

SI Analysis 

The SI enzyme is a single-stranded endonuclease that will digest both single-stranded RNA 
25 and DNA. According to the method of S 1 analysis, a probe that has been efficiently labeled to a high 
specific activity at the 5 7 end through the use of a kinase, is used to determine either the amount of an 
mRNA species or the 5' end of a message. A single stranded probe that is complementary to the 
sequence of the RNA species of interest is utilized in S 1 analysis. If the structure of a particular 
mRNA species is known, SI analysis is performed with oligonucleotide probes of at least 40 bp, that 
30 are complementary to the RNA of interest. It is preferable to use oligonucleotides wherein the 5' end 
of the oligonucleotide is complementary to the RNA. It is also preferable to use oligonucleotides 
wherein the 5' terminal residues contain dG or dC residues. If Si nuclease analysis will be utilized to 
determine the 5' termii of an RNA species, the 3 7 end of the oligonucleotide should extend at least 4 
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nucleotides beyond the RNA coding sequence. The inclusion of additional nucleotides facilitates 
differentiation of a band resulting from an RNA:DNA duplex and a band representing the probe. 

A hybridization probe for SI analysis is prepared by incubating 2pmol of an oligonucleotide in 
the presence of 150 mCi[y 32 P]ATP (3000-7000Ci/mmol), 2.5 ml 10X T4 polynucleotide kinase buffer 

5 (700mM Tris-Cl 5 pH 7.5, 100 mM MgCl 2 , 50 mM dithiothreitol, 1 mM spermidine-Cl, 1 mM EDTA), 
and 10U T4 polynucleotide kinase for 37°C for 30-60 minutes. The radiolabeled probe is ethanol 
precipitated and resuspended at lml/0.3ng oligonucleotide or lO 5 cpm. 

The hybridization reaction is performed as follows. An amount of probe equal to 5x1 0 4 
Cerenkov counts is added to 50mg RNA on ice and ethanol precipitated. The resulting pellet is 

10 resuspended in 20ml SI hybridization solution (80% deionized formamide, 40 mM PIPES, pH 6.4, 
400mM NaCl, 1 mM EDTA, pH 8), denatured for 10 min at 65°C and hybridized overnight at 30°C. 
The following day, 300 ml of a mixture of 150 ml 2x SI nuclease buffer (0.56M NaCl, 0.1 M sodium 
acetate, pH 4.5, 9mM ZnS0 4 ), 3ml 2mg/ml single-stranded calf thymus DNA, 147 nil H>0 and 300U 
SI nuclease is added to the hybridization reaction and incubated for 60 minutes at 30°C. Following the 

15 addition of 80ml SI stop buffer (4M ammonium acetate, 20mM EDTA, 40 mg/ml tRNA) the sample is 
ethanol precipitated, resuspended in formamide loading dye, denatured and analyzed on a denaturing 
polyacrylamide/urea gel of the appropriate percentage for the expected size of the protected band 
(Ausubel et aL, supra). 

20 RT-PCR 

The method of RT-PCR is useful according to the invention for RNA expression analysis. 
According to the method of reverse transcription /polymerase chain reaction (RT-PCR) during the 
reverse transcription (RT) step, the RNA is converted to first strand cDNA, which is relatively stable 
and is a suitable template for a PCR reaction. In the second step, the cDNA template of interest is 

25 amplified using PCR. This is accomplished by repeated rounds of annealing sequence- specific 

primers to either strand of the template and synthesizing new strands of complementary DNA from 
them using a thermostable DNA polymerase. 

An RNA sample is ethanol precipitated with a cDNA primer. It may be preferable to use a 
cDNA primer that is identical to one of the amplification primers. To the pellet is added 12 ml BA 

30 4ml 400mM TrisCl, pH 8.3, and 4 ml 400 mM KC1. The mixture is heated to 90°C, slow cooled to 
67°C, microfuged and incubated for 3 hours at 52°C. Following the addition of 29ml reverse 
transcriptase buffer (per sample/2.5ml 400mM TrisCl, pH8.3, 2.5ml 400mM KC1, 1ml 300mM MgCl 2 , 
5ml lOOmM DTT, 5ml 5mM 4 dNTP mix, 2ml actinomycin D, 11ml H^0) and 0.5ml (16U) AMV 
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reverse transcriptase, the sample is incubated for 1 hour at a temperature between 37°C and 55°C. 
The temperature will be adjusted in accordance with the composition of the primer and the RNA of 
interest. The sample is then extracted sequentially with phenol and chloroform, and ethanol 
precipitated. The resulting cDNA pellet is resuspended in 40ml HjO. 5ml of the cDNA sample is 

5 mixed with 5ml or each amplification primer (~20mM each), 4ml 5mM 4dNTP mix, 10ml lOX 

amplification buffer (500mM KC1, lOOniM TrisCi, pH8.4, lmg/ml gelatin) and 70.5ml H 2 0. After the 
mixture is heated for 2 minutes at 94°C, 0.5 ml (2.5U) Taq DNA polymerase is added and the sample 
is overlaid with mineral oil. PGR amplification of the cDNA will be performed using the following 
automated amplification cycles: 39 cycles (2 minutes at 55°C, 2 minutes at 72°C, 1 minute at 94°C), 1 

10 cycle (2 minutes at 55°C, 7 minutes at 72°C). The number of cycles can be varied in accordance with 
the abundance of RNA (Ausubel et al., supra). 

If a polymorphism is located in a transcription factor binding site, assays including but not 
limited to the yeast two-hybrid assay (Fields et al., 1994, Trends Genet., 10:286) can be used to 
determine the effects of a polymorphism on transcription factor binding. 

15 If the protein product of the gene of interest is a DNA binding protein the phenotypic outcome 

of a polymorphism may be impaired nuclear transport, DNA binding, chromatin assembly or chromatin 
structure, methylation or histone deacetylation. 

Nuclear Transport 

20 Immunocytochemical methods or cell fractionation techniques (as described above) are used 

to determine if the protein is correctly localized in the nucleus. 

The DNA binding properties of a transcription factor are determined by gel shift analysis (as 
described in Ausubel et al., supra), oligonucleotide selection, southwestern assays or by 
immunohistochemical analysis of fixed chromosomes. 

25 

Gel Shift Analysis 

The method of gel shift analysis is used to detect sequence specific DNA-binding proteins 
from crude extracts. According to this method, proteins that bind to an end-labeled DNA fragment will 
retard the mobility of the fragment. The change in the mobility of the labeled fragment is detected by 
30 the appearance of a discrete band comprising the DNA-protein complex. 

A number of methods for preparing nuclear and cytoplasmic extracts useful for gel shift 
analysis are known in the art. For example, nuclear extracts are prepared according to the following 
method. A cell pellet is washed in PBS, resupended in a volume of hypotonic buffer (10 mM HEPES, 
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pH 7.9, 1.5 mM MgCl 2 , lOmM KC1, 0.2 mM PMSF, 0.5 mM DTT ) that is approximately equal to 3 
times the packed cell volume and allowed to swell on ice for 10 minutes. Cells are homogenized in a 
glass Dounce homogenizer and the nuclei are collected by centrifugation and resupended in a volume 
of low-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 0.02 M KC1, 0.2 mM 
EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the packed nuclear volume. Following 
the addition of a volume of high-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM 
MgCl 2? 1.2 M KC1, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the 
packed nuclear volume (dropwise with stirring) to the nuclei, nuclear extraction is carried out for 30 
minutes with continuous gentle stirring. The nuclei are collected by centrifugation and the nuclear 
extract is dialyzed against 50 volumes of dialysis buffer (20 mM HEPES, pH 7.9, 20% (v/v) glycerol, 
lOOmM KC1, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) until the conductivities of extract and 
buffer are equivalent. The extract is removed from the dialysis tubing and analyzed for protein 
concentration (Ausubel et al., supra). 

Probes useful for gel shift analysis include a fragment of plasmid DNA or a gel-purified 
double stranded oligonucleotide. Preferably the probe is labeled with Klenow fragment by incubating a 
lOOml solution of plasmid DNA or oligonucleotide with lOOmCi of the desired [a- 32 P] dNTP, 4ml of 5 
mM 3 dNTP mix and 2.5 U Klenow fragment for 20 minutes at room temperature. Upon the addition s 
of 4ml of a solution comprising 5 mM of the dNTP corresponding to the radioactive dNTP, the sample 
is incubated for 5 minutes at room temperature. The radiolabeled probe is ethanol precipitated, 
resuspended in TE buffer and gel purified. 

Gel shift analysis is performed by incubating 10,000 cpm of the labeled probe (0. 1-0.5 ng) with 
2mg poly (dl-dC)-poly(dl-dC), 300 mg BSA, and approximately 15mg of a nuclear extract or buffered 
crude protein extract prepared, for example, as described above, for 15 minutes at 30°C. An aliquot of 
the binding reaction is analyzed by electrophoresis on a prewarmed low-ionic strength gel (e.g. a 4% 
polyacrylamide gel in TBE) and autoradiography (Ausubel et al., supra). 

Oligoselection Assays for DNA Binding Activity 

DNA binding activity is an essential property of proteins involved in many basic cell biological 
events, such as chromatin structure, transcriptional regulation, DNA replication and repair. The 
biological activity of a DNA binding protein can be assayed by defining the optimal target DNA 
binding site. Using the PGR based primer selection technique (Blackwell, 1990, Science, 250:1104) the 
canonical nucleotide sequence defining the binding site is elucidated in vitro by mixing purified full 
length protein, or just the DNA binding domain of a protein of interest, with an oligonucleotide duplex 



75 



WO 03/054166 



PCT/US02/41225 



pool containing a completely randomized central region flanked by primer-annealing sites. Multiple 
rounds of immunoprecipitation and amplification by PCR enriches for high affinity sites which are 
cloned are sequenced in order to define a canonical binding site. 

The ability of a DNA binding protein to correctly regulate chromatin assembly and structure 
5 can be determined by DNase hypersensitivity assays. Alternatively, coimmunoprecipitation 
experiments or Western blot analysis can be used to determine if the DNA binding protein is 
associated with a component of the chromatin. 

Southwestern Blot Assay for Protein-DNA Interactions 

10 The ability of a protein to bind DNA is measured by using the "Southwestern" blot technique 

(for example see Antalis et aL, 1993, Gene, 134:201). According to this method, radiolabeled DNA is 
incubated with protein that has been immobilized on nitrocellulose filters and the amount of bound 
DNA is measured by scintillation counting or autoradiography followed by densitometry. The protein 
to be tested can be pure protein, immunoprecipitated protein, crude cell lysates or even recombinant 

15 protein denatured directly from bacterial colonies, yeast or cell culture. 

Assay of Protein Binding to Chromosomes in Vivo: Immunocytology of Fixed Chr omosomes 

Numerous biologically important nuclear proteins are in direct contact with genomic DNA. 
The presence of these proteins can be detected immunocytologically by fixing metaphase 
20 chromosomes such that the protein is permanently fixed at the region of DNA to which it normally 
binds. The presence and cytological location of the protein can then be determined by incubating the 
fixed chromosomes with an antibody directed against the protein of interest, and performing standard 
methods of immunohistochemical staining (Zink and Paro, 1989, Nature, 337:468). 

25 Coimmunoprecipitation Assay for Chromatin Assemhly/Structure 

If an antibody specific for a protein of interest exists, immunoprecipitation can be used to test 
for the presence of the protein (Otto and Lee, 1993, Methods Cell Biol., 37:119, Banting, 1995, In 
Gene Probes 1: A practical approach. Chapter 8: Antibody probes, pp. 225-227, JRL press.). The 
following methods are used for determining if a protein of interest is associated with a particular 

30 subcellular component. According to one method, proteins are immunoprecipitated with an antibody 
specific for a cellular component (e.g. chromatin or nuclear antigens), the immunoprecipitated material 
is analyzed on a gel by denaturing polyacrylamide gel electrophoresis and western blot analysis is 
performed with an antibody specific for the protein of interest, to determine if a physical association 
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exists between the cellular component and the protein of interest. Various incubation and wash 
treatments of the cell lysate are used to remove background contamination and enhance the sensitivity 
of detection (Banting, 1995, supra). Alternatively, the initial immunoprecipitation can be carried out 
with the antibody specific for the protein of interest, and the western blot analysis can be performed 

5 with an antibody specific for a cellular component. According to a variation of this method, prior to 
immunoprecipitation the cells can be treated with a protein crosslinker to ensure that protein-protein 
interactions are maintained during immunoprecipitation. According to another variation of this method, 
proteins can be cross-linked to DNA and then precipitated (Dedon et al, 1991, Anal. Biochem., 
197:83). If DNA coprecipitates with a particular protein, this suggests that DNA is associated with, 

10 and presumably bound to the protein. The coprecipitating DNA can be sequenced to identify the 
bound sequence. 



DNAse Hypersensitivity 

The transcriptionally active promoter region of a gene can be analyzed for susceptibility to 

15 cleavage by DNAsel (Montecino et al. , 1994,Biochemistry, 33:348). Efficient cleavage of genomic 
DNA is dependent on the accessibility of this enzyme to the DNA, and is influenced by several 
factors, including nucleosome packaging, overall chromatin configuration, and the presence of DNA 
binding proteins such as transcription factors. DNA sequence variations within the promoter DNA 
may have profound effects on these factors and result in aberrant regulation of gene transcription and 

20 ultimately abnormal biological activity of the gene. Therefore, altered gene activity around a 

polymorphic site can be detected as increased or decreased DNAsel hypersensitivity (Vaishnaw et 
al., 1995, Immunogenetics, 41:354). 

Assay for DNA Methylation 

25 Accurate mapping of DNA methylation patterns, for example, in CpG islands which are 

unmethylated regions of DNA, is used to investigate and gain a better understanding of diverse 
biological processes such as the regulation of imprinted genes, X chromosome inactivation and tumor 
suppressor gene silencing inhuman cancer. DNA methylation at specific sites is most frequently 
studied by use of methylation-sensitive restriction endonucleases (for example Hpall) and Southern 

30 blotting (Sambrook et al., supra). The sensitivity of this method can be enhanced several hundred-fold 
by performing a ligation-mediated PCR step (as described in Steigerwald et al., 1990, Nucleic Acids 
Res., 6:1435) after enzyme treatment. An alternative strategy termed methylation-specific PCR 
(Herman et al., 1996, Proc Natl Acad Sci USA., 93:9821), is used to determine the methylation status 
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of CpG islands without the use of methylation-specific restriction enzymes. 
Histone-Deacetvlation 

Transcription of chromatin-packaged genes involves highly regulated changes in nucleosome 
5 structure that control DNA accessibility. Changes in nucleosome structure can be mediated by 
enzymatic complexes which control the acetylation and deacetylation of histones. Transcription 
elongation is required for the formation of the unfolded structure of transcribing nucleosomes, and 
histone acetylation is required for the maintenance of these structures (Walia et aL, 1998, J. Biol. 
Chem., 3:14516). Deacetylation can be prevented by incubating cells with histone deacetylase 
10 inhibitors such as sodium butyrate or trichostain A. To assay for changes in acetylation and the state 
of transcriptional activity, chromatin fractions are purified using organomercury and hydroxylapatite 
dissociation chromatographic techniques (Walia et al., supra). 

ii. Transcription Start Site 

15 

To determine if a particular polymorphism causes a change in the transcriptional start site of a 
candidate gene S 1 . nuclease mapping and primer extension can be performed. The presence of a 
polymorphism may cause an mRNA to be aberrantly expressed. In particular, a polymorphism may 
change the tissue specificity or developmental expression pattern of an mRNA species. A variety of 

20 molecular methods for detecting mRNA known in the art can be performed to determine the 

expression pattern of an mRNA These methods include, but are not limited to the following: Northern 
blot analysis, RT-PCR, SI analysis, RNase Protection analysis, or in situ hybridization analysis of 
sections, wherein the samples are derived from multiple different tissues or from a tissue at different 
stages of development. Northern blot analysis, RT-PCR and SI analysis can also be used to determine 

25 if a polymorphism results in an altered pattern of mRNA splicing. 

Northern-B 1 otting 

The method of Northern blotting is well known in the art. This technique involves the transfer 
of RNA from an electrophoresis gel to a membrane support to allow the detection of specific 
30 sequences in RNA preparations. 

Northern blot analysis is performed according to the following method. An RNA sample 
(prepared by the addition of MOPS buffer, formaldehyde and formamide) is separated on an 
agarose/formaldehyde gel in IX MOPS buffer. Following staining with ethidium bromide and 
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visualization under ultra violet light to determine the integrity of the RNA, the RNA is hydrolyzed by 
treatment with 0.05M NaOH/1.5MNaCl followed by incubation with 0.5M Tris-Cl (pH 7.4)/1.5M 
NaCl. The RNA is transferred to a commercially available nylon or nitrocellulose membrane (e.g. 
Hybond-N membrane, Amersham, Arlington Heights, IL) by methods well known in the art (Ausubel 
et al., supra, Sambrook et al., supra). Following transfer and UV cross linking, the membrane is 
hybridized with a radiolabeled probe in hybridization solution (e.g. in 50% formamide/2.5% 
Denhardt ? s/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE) at 42°C. The 
hybridization conditions can be varied as necessary as described in Ausubel et al., supra and 
Sambrook et al., supra. Following hybridization, the membrane is washed at room temperature in 2X 
SSC/0.1% SDS, at 42°C in IX SSC/0.1% SDS, at 65°C in 0.2X SSC/0.1% SDS, and exposed to film. 
The stringency of the wash buffers can also be varied depending on the amount of background signal 
(Ausubel et al., supra). 

RNase Protection Analysis 

RNase Protection analysis can be used to analyze RNA structure and amount and determine 
the endpoint of a specific RNA. 

The method of RNase protection is more sensitive than SI analysis since it utilizes a sequence 
specific hybridization probe that is labeled to a high specific activity. The probe is hybridized to sample 
RNAs and treated with ribonuclease to remove free probe. Following ribonuclease treatment, the 
fragments comprising probe annealed to homologous sequences in the sample RNA are recovered by 
ethanol precipitation, and analyzed by electrophoresis on a sequencing gel. The presence of the target 
mRNA is indicated by the presence of an appropriately sized fragment of the probe. 

A probe is labeled by the method of in vitro transcription (in the presence of [a- 32 P] CTP as 
described in Section B entitled "Production of a Polynucleotide Sequence". The RNA sample to be 
analyzed is ethanol precipitated and resuspended in 3 0ml hybridization buffer (4 parts formamide/1 
part 200 mM PIPES, pH 6.4, 2 M NaCl, 5 mM EDTA) containing 5 x 10 5 cpm of the probe RNA. 
The mixture is denatured 5 minutes at 85°C and incubated at the desired hybridization temperature 
(30°C to 60°C) for >8 hours. To each reaction mixture is added 350 ml ribonuclease digestion buff er 
(10 mM Tris-Cl, pH 7.5, 300 mM NaCl, 5 mM EDTA) containing 40 mg/ml ribonuclease A and 2 
mg/ml ribonuclease Tl. The sample is incubated for 30-60 minutes at 30°C Following the addition of 
10 ml 20%SDS and 2.5ml 20 mg/ml proteinase K, the sample is incubated for 15 minutes at 37°C. The 
sample is extracted with phenol /chloroformlisoamyl alcohol, ethanol precipitated, resuspended in RNA 
loading buffer (80% (v/v) formamide, 1 mM EDTA, pH 8.0, 0.1 % bromophenolblue, 0.1 % xylene 
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cyanol), denatured and analyzed by electrophoresis on a denaturing polyacrylamide/urea gel and 
autoradiography (Ausubel et aL, supra). 



Primer Extension 

5 The method of primer extension is used to map the 5 ' end of an RNA and to quantitate the 

amount of an RNA of interest by using reverse transcriptase to extend a primer that is complementary 
to a region of a given RNA. 

An oligonucleotide primer is labeled in a kinase reaction as described for SI analysis. The 
primer extension reaction is performed by mixing 10-50 mg total cellular- RNA (in 10ml) with 1.5ml 

10 10X Hybridization buffer (1.5M KC1, 0.1M TrisCl, pH 8.3, lOmM EDTA) and 3.5 ml labeled 
oligonucleotide. Samples are heated to 65°C for 90 minutes and allowed to slow cool at room 
temperature. To each sample is added 30 ml of primer extension reaction mixture (0.9 ml Tris-Cl, pH 
8.3, 0.9 ml 0.5M MgCl 2 , 0.25 ml DTT, 6.75 ml 1 mg/ml actinomycin D, 1.33 ml 5 mM 4dNTP mix, 20 
ml K>0, 0.2ml 25 U/ml AMV reverse transcriptase). Samples are incubated for 1 hour at 42°C, and 

15 then, following the addition of 105 ml RNase reaction mix (100 mg/ml salmon sperm DNA, 20 mg/ml 
RNase A) for 15 minutes at 37°C. Samples are extracted in phenol/chloroformlisoamyl alcohol, 
ethanol precipitated, resuspended in stop/loading dye (20 mM EDTA, pH 8.0, 0.05% bromophenol 
blue, 0.05% xylene, cyanol in formamide), heated at 65°C and analyzed by electrophoresis on a 9% 
acrylamide/7M urea gel and autoradiography. 

20 

In Situ Hybridization 

Cytological techniques well known in the art can be used to determine the temporal and spatial 
expression patterns of mRNA (in situ hybridization of tissue sections) and protein 
(immunohistochemistry in individual cells). 

25 

Preparation of histological samples 

Tissue samples intended for use in in situ detection of either RNA or protein are fixed using 
conventional reagents; such samples may comprise whole or squashed cells, or sectioned tissue. 
Fixatives useful for such procedures include, but are not limited to, formalin, 4% paraformaldehyde in 
30 an isotonic buffer, formaldehyde (each of which confers a measure of RNAase resistance to the 

nucleic acid molecules of the sample) or a multi-component fixative, such as FAAG (85 % ethanol, 
4% formaldehyde, 5% acetic acid, 1% EM grade glutaraldehyde). For the detection of RNA, water 
used in the preparation of an aqueous component of a solution to which the tissue is exposed until it is 
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embedded is RNAase-free, i.e. treated with 0.1% diethylprocarbonate (DEPC) at room temperature 
overnight and subsequently autoclaved for 1.5 to 2 hours. Tissue will be fixed at 4°C, either on a 
sample roller or a rocking platform, for 12 to 48 hours in order to allow the fixative to reach the center 
of the sample. 

5 Prior to embedding, excess fixative will be removed and the sample will be dehydrated by a 

series of two- to ten-minute washes in increasingly high concentrations of ethanol, beginning at 60% 
and ending with two washes in 95% and another two in 100% ethanol, followed by two ten-minute 
washes in xylene. Samples will be embedded in one of a variety of sectioning supports, e.g. paraffin, 
plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plus Tissue Embedding 
10 Medium, supplied by Oxford Labware). For example, fixed, dehydrated tissue will be transferred from 
the second xylene wash to paraffin or a paraffin/polymer resin in the liquid-phase at about 58°C. The 
paraffin or a paraffin/polymer resin will be replaced three to six times over a period of approximately 
three hours to dilute out residual xylene. The sample will be incubated overnight at 58°C under a 
vacuum, in order to optimize infiltration of the embedding medium into the tissue. The next day, 
15 following several additional changes of medium at 20 minute to one hour intervals, also at 58°C, the 
tissue sample will be positioned in a sectioning mold, the mold will be surrounded by ice water and the 
medium will be allowed to harden. Sections of 6mm thickness will be taken and affixed to 'subbed' 
slides, which are slides coated with a proteinaceous substrate material, usually bovine serum albumin 
(BSA), to promote adhesion. Other methods of fixation and embedding are also applicable for use 
20 according to the methods of the invention; examples of these are found in Humason, G.L., 1979, 
Animal Tissue Techniques, 4th ed. (W.H. Freeman & Co., San Fransisco), as is frozen sectioning 
(Serrano et al, 1989, supra). 

In situ Hybridization Analysis 

25 According to the method of in situ hybridization a specifically labeled nucleic acid probe is 

hybridized to cellular RNA present in individual cells or tissue sections. In situ hybridization can be 
performed on either paraffin or frozen sections. Depending on the desired sensitivity and resolution, 
either film or emulsion autoradioagraphy can be utilized to detect the hybridized radioactive probe. 

The following method of in situ hybridization is performed by incubating slides containing cell 

30 or tissue specimens in a slide rack contained within a glass staining dish. According to this method, it is 
preferable to use solutions that have been prepared fresh. Prior to the hybridization steps, slides are 
dew axed to remove the sectioning support material. The dew axing protocol involves sequential 
washes in xylene, rehydration by sequential washes in 100%, 95%, 70% and 50% ethanol, and 
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denaturation in 0.2N HC1. Following a heat denaturation step (70°C in 2X SSC), samples are postfixed 
in a freshly prepared solution of 4% PFA, washed in PBS, incubated in 10 mM DTT (10 min at 45°C) 
and blocked in 400 ml PBS containing 0.617g DTT, 0.74 g iodoacetamide and O.Sg N-ethylmaleimide, 
for 30 min at 45°C in a water bath covered with aluminum foil, due to the light sensitivity of 
iodoacetamide and N-ethylmaleimide. The samples are washed in PBS and equilibrated sequentially in 
freshly prepared 0. 1M txiethanolamine (TEA buffer), TEA buffer/0.25% acetic anhydride, and TEA 
buffer/0.5% acetic anhydride. Following a blocking step in 2X SSC, the sample are dehydrated by 
sequential washes in 50%, 70%, 95%, and 100% ethanol and air dried. 35 S-labeled riboprobes and 
competitor probes prepared in the absence of a radiolabel (prepared as described in Section B entitled 
"Production of a Polynucleotide Sequence") or double-stranded DNA probes (prepared with 
[ 35 S]dNTPs by methods well known in the art including nick translation or random oligonucleotide- 
primed synthesis) are heated to 100°C for 3 min and diluted to a concentration of 0.3 mg/ml final probe 
concentration, in 50% formamide, 0.3M NaCl, lOmM TrisCl, pH 8.0, 1 mM EDTA, lx Denhardt 
solution, 500 mg/ml yeast tRNA, 500 mg/ml poly(A) (Pharmacia), 50 mM DTT, 10% polyethylene 
glycol (MW 6000). The hybridization step is carried out by covering the sample with an appropriate 
amount of probe, and incubating for 30 min to 4 hour at 45°C in a chamber designed to prevent dilution 
or concentration of the hybridization solution. Samples are washed sequentially at 55°C in solution A y. 
(50% (v/v) formamide, 2X SSG, 20 mM 2-mercaptoethanol), and solution B (50% (v/v) formamide, 
2X SSC, 20 mM 2-mercaptoethanol, 0.5% (v/v) Triton-X-100) and at room temperature in solution C 
(2X SSC, 20 mM 2~ mercaptoethanol). Following a 15 minute incubation with RNase, samples are 
washed at 50"C in solution C, and at room temperature in 2X SSC. Samples are rehydrated by 
sequential washes in 50% ethanol/0.3M ammonium acetate, 70% ethanol/0.3M ammonium acetate, 
95% ethanol/0.3M ammonium acetate, and 100% ethanol. Slides are air dried and analyzed by film or 
by emulsion autoradiography (Ausubel et al., supra). 

iii. mRNA Stability/Control of Turnover and mRNA Transcription Rate 

Changes in mRNA stability/control of turnover and mRNA transcription rates due to the 
presence of a polymorphism, can be detected by the following methods. 

mRNA Stability 

Gene-expression can be regulated by variations in mRNA stability (Liebhaber, 1997, Nucleic 
Acids Symp Ser., 36:29 and Ross J. 1996, Trends Genet, 5:171). Any gene variation occurring within 
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the cis-acting elements which control mRNA abundance may influence gene expression levels (Peltz 
et al, 1992, Curr Opin Cell Biol., 4:979). Quantitative RT-PCR (Kohler, et al, 1995, Quantitation of 
mRNA by polymerase chain reaction, Springer) and mRNA radiolabelling techniques are two methods 
for measuring relative mRNA abundance and stability. Quantitative PCR employs an internal standard 
to provide a direct comparison between alternative reactions, enabling comparison of low abundance 
transcripts or transcripts derived from a sample that is only available in a limited quantity (McPherson 
MJ et al., eds, 1995, PCR2- A practical approach. IRL Press). 

Assay for mRNA Transcription Rates 

Genetic polymorphism within the regulatory regions of a gene can significantly alter 
transcription rate and mRNA stability, resulting in reduced biological activity of the encoded protein. 
One of the most sensitive assays for measuring the rate of gene transcription is the nuclear runoff 
assay (Groudine and Casimir, 1984, Nucleic Acids Res 12: 1427). Nuclei isolated from cell lines 
expressing the target gene of interest are treated with radiolabeled UTP and the level of incorporation 
of radiolabel into nascent RNA transcripts is determined by filter hybridization to immobilized cDNA 
derived from the target gene. 

iv. Intracellular mRNA Localization 

A genetic variation can cause a change in the localization of a particular mRNA species (e.g. 
to the cytoskeleton, or to the nuclear scaffold). 

Immunohistochemisitrv 

Changes in RNA localization can be detected by immunohistochemical methods well known in 
the art (e.g. in situ analysis described above). 

Oocyte Injection Assays 

In many cases mRNA, like protein, is localized in relation to the polarity of the cell or the 
cytoskeletal architecture (St. Johnston, 1995, Cell, 81:161). The Xenopus oocyte is a popular, 
experimentally tractable, system for studying intracellular trafficking of mRNA (Nakielny et al., 1997, 
Annu. Rev. Neurosci., 20:269). Fluorescently labelled RNA is microinjected into the large oocyte cell 
where its location can be detected using standard microscopy methods. Polymorphic variants of a 
particular mRNA species may differ in their response to cellular mechanisms responsible for 
partitioning mRNA within the cell. This method has been useful for demonstrating that sequence 
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variations can affect sub-cellular localization (Grimm et al., 1997,EMBO J., 16:793) 
v. Post-Translational Alterations 

Post-Translational alterations resulting from premature stop codons, translational readthrough 
5 or multiple open reading frames and translational suppression may occur as a result of a 

polymorphism. To detect post-translational alterations, a polynucleotide comprising one or more 

polymorphisms is subjected to in vitro transcription and in vitro translation (as described in sections B 

and J entitled "Production of a Polynucleotide Sequence" and "Preparation of a Labeled Protein"). 

The translation product(s) are analyzed for the appearance of aberrantly sized proteins. Additional 
10 post-translational alterations that may occur as a result of a polymorphism include changes in 

localization due to an altered signal sequence, and changes in glycosylation, myristilation, and 

susceptibility to or sites of proteolytic cleavage. 

The method of immunocytochemistry can be used to determine if a protein is incorrectly 

localized, due to the presence of an altered signal sequence. 

15 

ImmunoMstochemistry 

Immunohistochemical techniques including indirect immunofluorescence, immunope-r oxidase : 
labeling or immunogold labeling, are used for protein localization. 

Immunofluorescent labeling of tissue sections (prepared as for in situ analysis, described 
20 above) is performed by the following method. Slides containing the sample of interest are equilibrated 
to room temperature washed in PBS, incubated with an appropriate dilution of primary antibody (1 
hour at room temperature), washed in PBS, incubated with an appropriate dilution of secondary 
antibody (1 hour at room temperature), washed in PBS and analyzed under a microscope (Ausubel et 
al., supra). Alternatively, the sensitivity of the immunohistochemical reaction is increased by using a 
25 streptavidin-secondary antibody conjugate reacted with a biotinfluorochrome conjugate. Alternatively, 
immunogold labeling is used to detect a protein of interest by using an immunogold-conjugated 
secondary antibody. 

Immunoperoxidase labeling of tissue sections is performed by the following method. Slides are 
pretreated in 0.25% hydrogen peroxide, incubated with primary antibody, washed in PBS and 
30 incubated (1 hour at room temperature) with a specific secondary bridging antibody capable of 

recognizing both the primary antibody and a Horseradish peroxidase antiperoixidase (PAP) complex. 
The slides are washed in PBS and developed in diaminobenzidene substrate solution (0.03% (w/v) 3,3' 
diaminobenzidene in 200 ml PBS) at room temperature (Ausubel et al., supra). 
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Alternatively, protein localization is determined by cell fractionation wherein cells are 
biosynthetically labeled, the labeled material is fractionated, and the radiolabeled proteins in each 
fraction are analyzed by immunoprecitation with an antibody specific for the protein of interest. 



5 Assay for Glvcosylation Inhibition 

Changes in protein glycosylation can be detected by radiolabelling a protein of interest with 
sugars, determining if a change in the cellular localization (by immuno cytochemistry) of the protein in 
culture has occurred due to aberrant glycosylation, or by determining the effects of inhibitors of 
glycosylation on the migration pattern of proteins analyzed by polyacrylamide gel electrophoresis. 

10 Post-translational glycosylation of proteins plays an important role in defining protein function 

(Baeziger, 1994, FASEB J., 13:1019; Jacob, 1995, Curr. Opin. Struct. Biol, 5:605). Protein 
glycosylation can be inhibited by tunicamycin, an antibiotic, as well as by several sugar analogues 
(Schwarz, 1991, Behring Inst Mitt., 89:198). These reagents are used to characterize the effects of 
sequence changes on protein glycosylation. 

15 

Assay for Post-Translational Modification with Lipids 

Changes in protein modification with lipids (e.g. myristilation) are detected by radiolabelling a * 
protein of interest with myristic acid or by determining if a change in the cellular localization of the 
protein in culture has occurred as a result of aberrant lipid modification (by immunocytochemistry). 

20 Covalent attachment of lipids is a mechanism by which eukaryotic cells direct and, in some 

cases, control, membrane localization of proteins (Casey, 1994, Curr. Opin. Cell. Biol, 2:219). Such 
post-translational addition of myristyl, palmityl or prenyl side-chains has a key role in the functional 
regulation of many proteins (Chow et aL, 1992, Curr. Opin. Cell. Biol, 4:629; Resh, 1994, Cell, 
763:411). Assays for detecting proteins that are covalently modified by the attachment of lipids include 

25 labeling with [ 3 H]myristate (Stevenson et aL, 1992, J. Exp. Med., 176:1053), or a combination of 
enzymatic and chemical cleavage techniques performed in conjunction with tandem mass 
spectrometry to determine sites of modification (Papac et aL, 1992, J. Biol Chem., 267:16889). 



Proteolytic Cleavage 

Post-translational cleavage of polypeptides is an important mechanism for modulating protein 
function in many physiological processes. Protease activity is involved in zymogen processing, 
activation of enzyme catalysis, tissue/cell remodeling, signal transduction cascades, protein degradation 
and cell death pathways (Rappay, 1989, Prog Histochem Cytochem., 18:1). A protein that is predicted 
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to be a protease or the target of a protease can be assayed in vitro using purified proteins or cell 
extracts (Muta et al., 1995, J. Biol. Chem. 270:892) where cleavage efficiency is monitored by 
standard PAGE or western blotting. Alternatively, proteases and/or their targets can be expressed 
from expression plasmids in in vivo cell culture systems in order to monitor their biological activity 
(Zhang, et al., 1998, J. Biol. Chem. 273:1144). The specificity of proteolytic cleavage is determined 
using inhibitors that selectively block seine, cysteine, aspartic and metallo proteolytic activity (e.g. 
pepstatin A selectively inhibits aspartic proteases) (Rich, et al, 1985, Biochemistry., 24: 3165). 

To determine if a protein has been modified such that the sites of proteolytic cleavage have 
been altered, or susceptibility to proteolytic cleavage has changed pulse chase experiments with 
radiolabeled protein can be carried out to determine the precursor-product relationship following 
digestion with a protease of a given specificity. The method of pulse chase labeling is described in 
Ausubel et al., supra. Alternatively, inhibitors of proteases (e.g acid proteases or seine proteases) can 
be used to identify protease cleavage sites. 

vi. Changes in Receptor Properties 

If the gene of interest encodes a receptor protein, a polymorphism may modify the properties 
of the receptor such that receptor binding/turnover or activation is altered. Receptor formation can be 
impaired if a polymorphism causes improper receptor localization or assembly. 

Receptor Localization 

To determine if a receptor protein is being expressed at the proper location (e.g. nucleus, 
cytoplasm, cell surface), the receptor can be localized by immunocytochemical techniques. 
Alternatively, cells that are expressing the receptor can be fractionated and subjected to Western blot 
analysis or biosynthetically labeled, fractionated and analyzed by immunoprecipitation. 

Protein-Protein Interactions/In vitro Assembly Assays for Receptors 

A number of methods can be used to determine if a receptor is colocalized with the 
appropriate protein partner. 

The function of a protein maybe dependent on the ability of the protein to interact with other 
proteins as part of a large complex. For example, certain cell surface receptors consist of a receptor 
complex that is composed of several homo- or heteromeric protein subunits, and activation by ligand 
can result in altered protein-protein interactions both within the receptor complex and with 
"downstream" targets such as G-proteins (Okada and Pessin, 1996, J. Biol. Chem., 271:25533). 
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Protein-protein interactions can be assayed immunologically by coimmunoprecipitation of native 
(Gilboa et al, 1998, J. Biol. Chem., 140:767) or chemically cross-linked complexes (Haniu et al., 1997, 
J. Biol. Chem., 272:25296), or through protein-protein mobility shift assays (Stern and Frieden, 1993, 
Anal. Biochem., 212:221). If all of the components of a receptor complex have been identified, one 
5 can employ in vitro reconstitution assays to assess whether a single protein alteration can effect the 
functioning of the entire complex (Durovic et al, 1994, J. Biol. Chem., 269:30320). 

Assay for In Vitro Assembly of Multimeric Prote in Complexes 

To determine whether these genetic variations have affected protein complex assembly, 
10 experiments are carried out wherein recombinant mutant subunits are transfected into cells and 
coexpressed with the other subunit components in vitro. Proper assembly is assessed by 
immunoprecipitation of the protein complex in question with antibodies specific for the various 
members of the complex followed by PAGE analysis (Koster et al., 1998, Biophysl. J., 74:1821). 

15 Assay Receptor Binding/Turnover 

Receptor-ligand interaction is essential for the functionality of the bound complex. Genetic 
changes that alter either ligand or receptor can dramatically affect receptor binding, turnover, and 
subsequent activation of downstream signaling events. Receptor binding/turnover can be measured by 
standard Scatchard analysis of radiolabelled ligand binding in vitro (Culouscou et al., 1993, J. Biol. 

20 Chem. 268:10458) or in cellular based assays (Greenlund et al., 1993, J. Biol. Chem. 268: 18103). 

Ligand Binding as Measured by Affinity Chromatography 

Alternatively, affinity chromatography methods (well known in the art) can be employed to 
determine if a receptor is demonstrating aberrant binding characteristics. According to the method of 

25 affinity chromatography, receptor-ligand interactions are allowed to occur, and the binding efficiency 
or receptor and ligand and/or turnover of receptor-ligand complexes is measured. Alternatively, 
affinity chromatography can be used to isolate one or more components of a receptor ligand 
interaction for farther analysis (March et al., 1974, Adv. Exp. Med. Biol., 42:3). The method of affinity 
chromatography typically involves immobilizing on a solid support one component, for example a 

30 known ligand for a receptor, and then incubating the immobilized ligand with radiolabelled protein 

under optimal binding conditions. To measure the exact binding affinity of a given ligand-receptor pair, 
an increasing amount of non-labeled competitor is added. This assay can be used to assess altered 
binding efficiency resulting from the presence of a polymorphism in a protein of interest. 
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Receptor Activation Assays: Phosphorylation, K inase Acti vity and Mitogenic Stimulation 

Almost all signaling that occurs through cell surface receptors is regulated by phosphorylation, 
a reversible post-translational event that occurs at specific amino acid residues and is catalyzed by a 
protein kinase activity present within the receptor itself (autophosphorylation) or in trans via direct 

5 interaction with an associated kinase (Hunter, 1997, Philos Trans R Soc Lond B Biol Sci., 353:583). 
The specific effect of phosphorylation on a biological activity depends on the receptor, but often 
results in modulation of endogenous receptor kinase activity or interaction with associated proteins, 
which are also often kinases. The results of a phosphorylation event are passed on through a cascade 
of protein kinases/phosphatases which ultimately effect downstream processes controlling gene 

10 transcription, cell proliferation, metabolism, movement and differentiation (Patarca, 1996, Crit Rev 
Oncog., 7:343). The biological function of a receptor is usually assayed in cell culture following over- 
expression. The phosphorylated state of a receptor can be assayed directly by immunological methods 
by employing an antibody that specifically recognizes a phosphorylated residue (Bangalore, 1992., 
Proc Natl Acad Sci USA., 89:11637). Endogenous kinase activity associated with a receptor is 

15 measured via the incorporation of radiolabeled phosphate in immunoprecipitated receptor complex 
(Kazlauskas and Cooper, 1989, Cell 58:1121). "Downstream" events of receptor activity including 
mitogenic stimulation or map kinase activity, can be measured by tritiated thymidine incorporation (Luo 
et ai, 1996, Cancer Res. 56:4983), or by mobility-shift analysis of map kinase on western blots (Vietor, 
1993., J. Biol. Chem. 268:18994), respectively. 

20 Immunocytochemical methods can be used to determine if a receptor-ligand complex is 

correctly translocated to the nucleus. Alternatively, nuclear preparations (prepared as described 
below) can be analyzed by Western blot or immunoprecipitation for the presence of the receptor 
protein. 

If a receptor is a transcriptional activator, the ability of the receptor to induce gene expression 
25 can be measured by a variety of methods including Northern blot analysis, or reporter gene assays 
. wherein the promoter region isolated from a gene that is activated by the receptor regulates the 
expression of a reporter protein. 

vii. Enzyme Catalysis 

30 The gene of interest may encode a protein that has an enzymatic activity wherein the enzyme 

catalyzes a reaction that is critical to the general metabolism of a cell. To determine if a mutated 
protein is impaired in its enzymatic function, assays can be performed to measure the enzymatic 
activity of the protein. There are many important enzymatic activities associated with normal cellular 
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metabolism, including: glycosidation, esterification, amidaticm, hydroxylation, acetylation, sulfonylation, 
alkylation. Each of these activities are assayed using in vitro methods employing overexpressed or 
purified proteins, well known in the art (Eisenthal and Danson, 1992, Enzyme Assays: A Practical 
Approach, Rickwood et al., Eds., IRL Press. Oxford, England). 

5 The protein of interest may also be involved in various aspects of DNA synthesis or 

replication. In vitro assays for the enzymatic reactions involved in DNA synthesis or replication (e.g. 
polymerase, ligase, exonuclease or helicase activity) are known in the art. The biological activity of the 
proteins catalyzing these activities are assayed in vitro using standard enzymatic techniques (Adams, 
199, DNA Replication: A Practical Approach I, Rickwood, et al., Eds., IRL Press. Oxford, England). 

10 If the protein of interest is involved in glycolysis or energy transport, assays for measuring 

transporter activity or the activity of ATP dependent pumps are useful, according to the invention, for 
determining if a mutated protein is impaired in these functions. 

Transporter Activity 

15 Mammalian cells possess a variety of transporter systems, for example amino acid 

transporters, which have overlapping substrate specificity (Van Winkle et al., 1993, Biochim Biophys t 
Acta, 1 154: 157). To determine if a polymorphism in a candidate gene of interest has altered the 
function of the protein product of this gene as a molecular transporter, the full-length cDNA clone is 
isolated by standard expression cloning strategies, and a change in activity of the full-length cRNA or 

20 antisense cRNA upon microinjection into Xenopus laevis oocytes is determined by measuring changes 
in influx/efflux transport of radiolabeled amino acid molecules (Broer et al., 1995, Biochem J., 312(Pt 
3):863), neurotransmitters or their metabolites. 

ATP-dependent pumps Activity 

25 Mammalian cells possess a variety of molecules that are categorized as ATP-binding cassette 

or ATP-dependent transporters or pumps. These include the Na + -K + -ATPase ion pump, the calcium 
uptake pump, (K + + H + )-ATPase and the human multidrug resistant protein termed P-glycoprotein. 
Alterations in pump activity are investigated by expressing the clone specific for the pump protein(s) 
of interest in Xenopus oocytes, and performing tracer studies which measure the changes in ATP- 

30 dependent uptake or extrusion of a radiolabeled substrate, and changes in the coupling ratios (e.g. 
moles substrate transported/mole ATP hydrolyzed) (Shapiro et al., 1998, Eur. J. Biochem., 254:189). 

viii. Ion Channel 
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The gene of interest may encode for a protein that is a component of an ion channel. 
Imrnunocytochemical methods can be used to determine if an ion channel protein demonstrates the 
appropriate cell type specificity. 

The activity of an ion channel can be measured by electrophysiological methods in oocytes. 
Alternatively, the sensitivity of ion channel activity to a particular inhibitor can be determined. 



Assays for Ion Channel Activity in Oocytes 

Polymorphisms which alter ion channel function and regulation are studied using the oocytes 
ofXenopus laevis. Injection of the oocytes with exogenous in vitro transcribed mRNA results in the 

10 production and functional expression of foreign membrane proteins, including voltage- and 

neurotransmitter- operated ion channels (Dascal et al., 1987., CRC Crit Rev Biochem., 224:317). 
Changes in the oocyte transmembrane current in response to expression of an exogenous mRNA is 
measured. This technique has been improved by the development of rapid superfusion systems that 
utilize a dual role perfusion micropipette that controls internal solution as well as monitoring voltage 

15 (Costa et al., 1994, Biophys J., 67:395). This technology represents a useful system for studying 

various aspects of ion channels encoded for by foreign mRNAs including channel expression, single- 
channel behavior, and the response of channels to the action of pharmacologically active substances 
(Sigel, 1987J. Physiol, 386: 73). 

20 Patch Clamp Assays for Ion Channel Activity 

The function of individual channel proteins is determined by the high resolution patch clamp 
technique. This technique (which is useful in a variety of cell types, including Xenopus oocytes 
described above) involves measuring changes in transmembrane current across the cell membrane in 
vitro (Sachs et al., 1983, Methods Enzymol., 103: 147). Processes such as signaling, secretion, and 

25 synaptic transmission are examined at the cellular level by the patch clamp method. The gene 

expression pattern and protein structure of ionic channels can be determined by combining information 
derived from high-resolution electrophysiological recordings obtained by the patch clamp method with 
molecular biological analysis (Liem et aL, 1995, Neurosurgery, 36: 382). 

A polymorphic variation in a gene that encodes a protein that is a member of a multimeric 

30 protein complex, such as an ion channel or a cyto skeletal structural component, can alter the assembly 
and function the multimeric protein complex (Lee et aL, 1994., Biophys J., 66: 667). A gene variation 
may affect protein-protein interaction, or disrupt the production of components of a multimeric 
complex, thereby disrupting stoichiometry and consequently decreasing stability. 
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Assay for In Vitro Assembly of Multimeric Protein Complexes 

In vitro assembly assays (described above) can be performed to determine if a polymorphism 
has affected the assembly of an ion channel. 

5 ix. Cellular Properties 

The influence of a polymorphism on general aspects of cell behavior, including cell 
morphology, adhesive properties, differentiation and proliferation can be assessed using a combination 
of methods including microscopic observation of cell cultures (Azuma et al., 1994, Histol.Histopathol., 
9:781), immunohistochemistry, and FACs analysis techniques (Beesley, 1993, Immunocytochemistry: a 

10 Practical Approach, Rickwood, et al, (Eds), IRL Press and Ormerod, 1994, Flow Cytometry: a 
practical Approach, Rickwood et al., (Eds), IRL Press. Oxford, England). 

Assays for Measuring Apoptosis 

Apoptosis has been implicated in the etiology and pathophysiology of a variety of human 

15 diseases. Gene variants which influence the process of apoptosis can be assessed by a variety of 
methods of analysis involving either the tissues or cells (Allen et al, 1997, J Pharmacol Toxicol 
Methods, 37: 215). Cell-cultures expressing the gene variants of interest are analyzed using Annexin V r 
which interacts strongly with phosphatidylserine residues that have been exposed as a result of plasma - 
membrane breakdown occurring in the early stages of apoptosis. Either vital or fixed material can be 

20 analyzed by Annexin V labeling in combination with microscopy and flow cytometry detection 
methods (vanEngeland et al, 1998, Cytometry, 31:1). TdT-mediated deoxyuridine triphosphate 
(dUTP)-biotin nick end-labeling (TUNEL) is a preferred method for specific staining of apoptotic cells 
in histological sections and cytology specimen (Labat-Moleur et al, 1998, J. Histochem Cytochem., 
46:327; Sasano et al, 1998., Diagn Cytopathol,18:398). Apoptosis is also detected by quantification of 

25 DNA fragmentation by ethidium bromide staining and gel electrophoresis, or by the use of saturation 
labeling of 3' ends of DNA fragments (Peng and Liu, 1997, Lab Invest., 77:547). 

Assay for In Vivo Receptor Function: Growth Cone Guidance Assay 

Activation of cell-surface receptors can result in the stimulation of cell motility. There are 
30 many different families of signaling molecules, for example the netrins, (Serafini et al., 1994, Cell. 78: 
409), which are responsible for both contact mediated or chemo-mediated attraction and repulsion of 
migrating cells. A classic model for this activity is the trajectory that the leading edge "growth cone" 
takes when a neuron is stimulated to grow out from explanted neural tissue in cell culture (Goodman, 
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1996, Annu Rev Neurosci. 19: 341). Ligands present in the culture medium or immobilized on a 
substrate bind to receptors on the cell-surface of the growth cone and trigger second-messenger 
signals thereby dictating an appropriate steering response. The biological activity of such receptors or 
ligands can be measured by overexpressing the receptor or ligand protein in culture and then 
5 monitoring growth cone guidance (Kremoser et aL, 1995, Cell 82: 359). Attraction or repulsion of cells 
which is observed to be different than normal is an indication of the role of this protein in growth 
guidance, and identifies the polymorphisms as altering function. 



x. Changes in gene expression or protein function that result from the presence of a 
10 polymorphism can be detected by in vivo assays including the production of transgenic animals, knock 
out animals or the analysis of naturally occurring animal models of a particular disease. 



Transgenic Animals 

Transgenic mice provide a useful tool for genetic and developmental biology studies and for 
15 the determination of a function of a novel sequence. According to the method of conventional 

transgenesis, additional copies of normal or modified genes are injected into the male pronucleus of the 
zygote and become integrated into the genomic DNA of the recipient mouse. The transgene is 
transmitted in a Mendelian manner in established transgenic strains. 

Constructs useful for creating transgenic animals comprise genes under the control of either 
20 their normal promoters or an inducible promoter, reporter genes under the control of promoters to be 
analyzed with respect to their patterns of tissue expression and regulation, and constructs containing 
dominant mutations, mutant promoters, and artificial fusion genes to be studied with regard to their 
specific developmental outcome. Transgenic mice are useful according to the invention for analysis of 
the dominant effects of overexpressing a candidate gene in mouse. Typically, DNA fragments on the 
25 order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, New. Anat, 
253:19). Transgenic animals can be created with a construct comprising a candidate gene containing 
one or more polymorphisms according to the invention. Alternatively, a transgenic animal expressing a 
candidate gene containing a single polymorphism can be crossed to a second transgenic animal 
expressing a candidate gene containing a different polymorphism and the combined effects of the two 
30 polymorphisms can be studied in the offspring animals. Transgenic mice engineered to overexpress a 
number of genes, including PCK1 (Valera et aL, 1994, Proc. Natl. Acad. Sci. USA, 91: 9151), INS 
(Mitanchez et aL, FEBS Letters, 421: 285), IAPP (D'Alession et aL, 1994, Diabetes, 43:1457), Asp 
(Klebig et aL, Proc. Natl. Acad. Sci. USA, 92: 4728) and Agrt (Graham et aL, Nature Genetics, 
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17:273), have been prepared and maybe useful for studying osteoarthritis. 
Knock Out Animals 

5 i. Standard 

Knock out animals are produced by the method of creating gene deletions with homologous 
recombination. This technique is based on the development of embryonic stem (ES) cells that are 
derived from embryos, are maintained in culture and have the capacity to participate in the 
development of every tissue in the mouse when introduced into a host blastocyst. A knock out animal 

10 is produced by directing homologous recombination to a specific target gene in the ES cells, thereby 
producing a null allele of the gene. The potential phenotypic consequences of this null allele (either in 
heterozygous or homozygous offspring) can be analyzed (Reeves, supra). Single or double knock out 
mice that may be useful for studying osteoarthritis have been produced for a number of genes 
including IRS 1 (Araki et al., 1994, Nature, 372:186, Tamemoto et al, 1994, Nature, 372:182), 1R52 

15 (Withers et al., 1998, Nature, 391:900), INSR, BIRKO, MIRKO, 1NSR (Lamothe et al., 1998, FEBS 
Letter, 426:381), GLUT2, GLUT4 (Katz et al, 1995, Nature, 377:151), GLP1R (Gallwitz and Schmidt, 
1997, Z. Gastroenterol, 35:655):, GCK (Sakura et al., 1998, Diabetologia, 41:654), GCK/IRS1, 
IRS1/INSR, MC4R (Huszar et aL, 1997, Cell, 88:13 1) and BRS3 (Ohki-Hamazaki et al., 1997, 
Nature, 390:165). 

20 

ii. In vivo Tissue Specific Knock Out in Mice Using Cre-lox. 

The method of targeted homologous recombination has been improved by the development of 
a system for site-specific recombination based on the bacteriophage PI site specific recombinase Cre. 
The Cre-loxP site-specific DNA recombinase from bacteriophage PI is used in transgenic mouse 

25 assays in order to create gene knockouts restricted to defined tissues or developmental stages. 

Regionally restricted genetic deletion, as opposed to global gene knockout, has the advantage that a 
phenotype can be attributed to a particular cell/tissue (Marth, 1996, Clin. Invest. 97: 1999). In the Cre- 
loxP system one transgenic mouse strain is engineered such that loxP sites flank one or more exons of 
the gene of interest. Homozygotes for this so called 'foxed gene' are crossed with a second 

30 transgenic mouse that expresses the Cre gene under control of a cell/tissue type transcriptional 
promoter. Cre protein then excises DNA between loxP recognition sequences and effectively 
removes target gene function (Sauer, 1998, Methods, 14:381). There are now many in vivo examples 
of this method, including the inducible inactivation of mammary tissue specific genes (Wagner et al., 
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1997, Nucleic Acids Res., 25:4323). 

iii. Bac Rescue of Knock Out Phenotype 

In order to verify that a particular genetic polymorphism/mutation is responsible for altered 
protein function in vivo one can "rescue" the altered protein function by introducing a wild-type copy 
of the gene in question. In vivo complementation with bacterial artificial chromosome (BAC) clones 
expressed in transgenic mice can be used for these purposes. This method has been used for the 
identification of the mouse circadian Clock gene (Antoch et al, 1997, Cell 89: 655). 

iv. Naturally Occurring Animal Models 

Naturally occurring animal models useful for studying osteoarthritis include models of severe 
hyperglycaemia (celebes black ape, Chinese hamster, diabetes mouse (db), Djunjarian hamster, 
Egyptian sand rat, Hartley guinea pig, OLETF rat, New Zealand white rabbit, obese BBZ/Wor rat, 
rhesus monkey, South African hamster, spiny mouse), models for moderate hyperglycaemia (Cohen 
diabetic rat, GK rat, Japanese KK mouse, male Bristol CBA/Ca mouse, male eSS rat, male WKY 
fatty rat, male Wistar WBN/Kob rat, male ZDF rat, NZO mouse, obese mouse (ob), PBB/Ld mouse, 
spontaneously hypertensive corpulent (SHR/N-cp) rat, Tuco-tuco, Wellesley hybrid mouse, yellow 
obese mouse) and impaired glucose tolerance (ageing laboratory rats and mice, BHE rat, Fatty Zucker 
rat (fa), Mongolian gerbil, NON diabetic mouse, squirrel monkey, Yucatan miniature swine) (Pickup 
and Williams, eds., Textbook of Diabetes, 2nd Edition, Blackwell Science). 

G. Production of an Amplified Product 

Amplified products useful according to the invention can be prepared by utilizing the method 
of PGR as described in Section B entitled "Production of a Polynucleotide Sequence Primers useful 
for producing an amplified product according to the invention (e.g. an amplified product comprising 
one or more polymorphisms) can be designed and synthesized as described in Section A entitled 
"Design and Synthesis of Oligonucleotide Primers". 

The invention provides methods (e.g. Southern blot analysis, PCR, primer extension and 
oligonucleotide hybridization), of detecting a polymorphism in an amplified product. 

H. Production of a Mutant Protein 

1. Expression of the Nucleotide Sequence 
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In accordance with the present invention, polynucleotide sequences which encode candidate 
gene protein fragments, fusion proteins or functional equivalents thereof may be used in recombinant 
DNA molecules that direct the expression of a candidate gene protein in appropriate host cells. Due to 
the inherent degeneracy of the genetic code, other DNA sequences which encode substantially the 
5 same or a functionally equivalent amino acid sequence, may be used to clone and express the 

candidate gene protein. As will be understood by those of skill in the art, it may be advantageous to 
produce candidate gene-encoding nucleotide sequences possessing non-naturally occurring codons. 
Codons preferred by a particular prokaryotic or eukaryotic host (Murray et al., 1989, Nucleic Acid 
Res 17:477) can be selected, for example, to increase the rate of protein expression or to produce 
10 recombinant RNA transcripts having desirable properties, such as a longer half-life as compared to 
transcripts produced from the naturally occurring sequence. 

The nucleotide sequences of the present invention can be engineered in order to alter a 
candidate gene-encoding sequence for a variety of reasons, including but not limited to, alterations 
which modify the cloning, processing and/or expression of the gene product. For example, mutations 
15 may be introduced using techniques which are well known in the art, e.g., site-directed mutagenesis to 
insert new restriction sites, to alter glycosylation patterns, to change codon preference or to produce 
splice* variants. • 

In another embodiment of the invention, a natural, modified or recombinant candidate gene 
protein-encoding sequence may be ligated to a heterologous sequence to encode a fusion protein (as 
20 described in Section B entitled "Production of a Polynucleotide Sequence"). For example, for 
screening of peptide libraries for inhibitors of candidate gene protein activity, it maybe useful to 
encode a chimeric protein that is recognized by a commercially available antibody, a fusion protein 
may also be engineered to contain a cleavage site located between a candidate protein and the 
heterologous protein sequence, so that the protein of interest may be substantially purified away from 
25 the heterologous moiety following cleavage. 

In another embodiment of the invention, the sequence encoding the candidate gene protein 
may be synthesized, whole or in part, using chemical methods well known in the art (see Caruthers, et 
aL, 1980, Nuc Acids Res Symp Ser, 7:215, Horn, et al., 1980, Nuc Acids Res Symp Ser, 225, etc.) 
Alternatively, the protein itself, or a portion thereof, could be produced using chemical methods of 
30 synthesis. For example, peptide synthesis can be performed using various solid-phase techniques 

(Roberge, et al., 1995, Science, 269:202) and automated synthesis maybe achieved, for example, using 
the A.I. 43 1 A Peptide Synthesizer (Perkin Elmer) in accordance with the instructions provided by the 
manufacturer. 
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The newly synthesized peptide can be substantially purified by preparative high performance 
liquid chromatography (e.g., Creighton, 1983, Proteins, Structures and Molecular Principles, WH 
Freeman and Co. New York NY). The composition of the synthetic peptides may be confirmed by 
amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra). 
5 Additionally the amino acid sequence of interest, or any part thereof, may be altered during direct 
synthesis and/or combined using chemical methods with sequences from other proteins , or any part 
thereof, to produce a variant polypeptide. 

2. Expression Systems 

10 In order to express a biologically active protein, the nucleotide sequence encoding the protein 

of interest or its functional equivalent, is inserted into an appropriate expression vector, i.e., a vector 
which contains the necessary elements for the transcription and translation of the inserted coding 
sequence. 

Methods which are well known to those skilled in the art can be used to construct expression 

15 vectors containing a protein-encoding sequence and appropriate transcriptional or translational 

controls. These methods include in vivo recombination or genetic recombination. Such techniques are 
described in Ausubel et al., supra and Sambrook et al., supra. 

A variety of expression vector/host systems may be utilized to contain and express a protein 
product of a candidate gene according to the invention. These include but are not limited to 

20 microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid 
DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems 
infected with virus expression vectors (e.g., baculovirus); plant cell systems transfected with virus 
expression vector (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed 
with bacterial expression vectors (e.g., Ti or pBR322 plasmid); or animal cell systems. 

25 The "control elements" or "regulatory sequences" of these systems vary in their strength and 

specificities and are those, nontranslated regions of the vector, enhancers, promoters, and V 
untranslated regions, which interact with host cellular proteins to carry out transcription and 
translation. Depending on the vector system and host utilized, any number of suitable transcription and 
translation elements, including constitutive and inducible promoters, maybe used. For example, when 

30 cloning in bacterial systems, inducible promoters such as the hybrid lacZ promoter of the Bluescript® 
phagemid (Stratagene, LaJolla CA) or pSportl (Gibco BRL) and ptrp-lac hybrids and the like maybe 
used. The baculovirus polyhedron promoter may be used in insect cells. Promoters or enhancers 
derived from the genomes of plant cells (e.g., heat shock, RUBISCO; and storage protein genes) or 
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from plant virus (e.g. viral promoters or leader sequences) may be cloned into the vector. la 
mammalian cell systems promoters from the mammalian genes or from mammalian viruses are most 
appropriate. If it is necessary to generate a cell line that contains multiple copies of the sequence 
encoding the protein product of the gene of interest, vectors based on 5V40 or EBV may be used with 
an appropriate selectable marker. 

la bacterial systems, a number of expression vectors may be selected depending upon the use 
intended for the protein of interest. For example, when large quantities of a protein are required for the 
production of antibodies, vectors which direct high level expression of fusion proteins that are readily 
purified maybe desirable. Such vectors include, but are not limited to, the multifunctional E. coli 
cloning and expression vectors such as Bluescript® (Stratagene), in which the sequence encoding the 
protein of interest may be ligated into the vector in frame with sequences encoding the ammo-terminal 
Met and the subsequent 27 residues of b-galactosidase so that a hybrid protein is produced; pESf 
vectors (Van Heeke & Schuster, 1989, J Biol Chem 264:5503); and the like. Pgex vectors (Promega, 
Madison WI) may also be used to express foreign polypeptides as fusion proteins with GST. In 
general, such fusion proteins are soluble and can easily be purified from lysed cells by adsorption to 
glutathione-agarose beads followed by elution in the presence of free glutathione. Proteins made in 
such systems are designed to include heparmn, thrombin or factor XA protease cleavage sites so that 
the cloned polypeptide of interest can be released from the GST moiety at will. 

In the yeast, Saccharomyces cerevisiae, a number of vectors containing constitutive or 
inducible promoters such as alpha factor, alcohol oxidase and PGH may be used. For reviews, see 
Ausubel et al (supra) and Grant et al, 1987, Methods in Enzymology 153:516. 

la cases where plant expression vectors are used, the expression of a sequence encoding a 
protein of interest may be driven by any of a number of promoters. For example, viral promoters such 
as the 35S and 19S promoters of CaMV (Brisson et al., 1984, Nature 310:511) maybe used alone or 
in combination with the omega leader sequence from TMV (Takamatsu et al., 1987, EMBO J 6:307). 
Alternatively, plant promoters such as the small subunit of RUBISCO (Coruzzi et al., 1984, EMBO J 
3:1671; Broglie et al., 1984, Science, 224:838); or heat shock promoters (Winter I and Sinibaldi RM, 
1991, Results Probl Cell Differ., 17:85) maybe used. These constructs can be introduced into plant 
cells by direct DNA transformation or pathogen-mediated transection. For reviews of such techniques, 
see Hobbs S or Murry LE in McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill 
New York NY, pp 191-196 or Weissbach and Weissbach (1988) Methods for Plant Molecular 
Biology, Academic Press, New York, pp 421-463. 

An alternative expression system which could be used to express a protein of interest is an 
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insect system. In one such system, Autographa califomica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. 
The sequence encoding the protein of interest may be cloned into a nonessential region of the virus, 
such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion 

5 of the sequence encoding the protein of interest will render the polyhedron gene inactive and produce 
recombinant virus lacking coat protein coat. The recombinant viruses are then used to infect S. 
frigoerda cells or Trichoplusia larvae in which the protein of interest is expressed (Smith et al., 
1983., J Virol 46:584; Engelhard, et al., 1994, Proc Natl Acad Sci 91:3224). 

In mammalian host cells, a number of viral-based expression systems may be utilized. In cases 

10 where an adenovirus is used as an expression vector, a sequence encoding the protein of interest may 
be ligated into an adenovirus transcription/translation complex consisting of the late promoter and 
tripartite leader sequence. Insertion in a nonessential El or E3 region of the viral genome will result in 
a viable virus capable of expressing in infected host cells (Logan and Shenk, 1984, Proc Natl Acad 
Sci, 81:3655). In addition, transcription enhancers, such as the rous sarcoma virus (RSV) enhancer, 

15 may be used to increase expression in mammalian host cells. 

Specific initiation signals may also be required for efficient translation of a sequence encoding 
the protein of interest. These signals include the ATG initiation codon and adjacent sequences. In 
cases where the sequence encoding the protein, its initiation codon and upstream sequences are 
inserted into the most appropriate expression vector, no additional translational control signals may be 

20 needed. However, in cases where only coding sequence, or a portion thereof, is inserted, exogenous 
transcriptional control signals including die ATG initiation codon must be provided. Furthermore, the 
initiation codon must be in the correct reading frame to ensure transcription of the entire insert. 
Exogenous transcriptional elements and initiation codons can be of various origins, both natural and 
synthetic. The efficiency of expression maybe enhanced by the inclusion of enhancers appropriate to 

25 the cell system in use (Scharf, et al., 1994, Results Probl Cell Differ, 20:125; Bittner et al., 1987, 
Methods in Enzymol, 153:516). 

In addition, a host cell strain may be chosen for its ability to modulate the expression of the 
inserted sequences or to process the expressed protein in the desired fashion. Such modifications of 
the polypeptide include but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, 

30 lipidation and acylation. Post-translational processing which cleaves a "prepro" form of the protein 
may also be important for correct insertion, folding and/or function. Different host cells such as CHO, 
HeLa, MDCK, 293, W138, etc have specific cellular machinery and characteristic mechanisms for 
such post-translational activities and maybe chosen to ensure the correct modification and processing 
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of the introduced, foreign protein. 

For long-term, high-yield production of recombinant proteins, stable expression is preferred. 
For example, cell lines which stably express a foreign protein maybe transformed using expression 
vectors which contain viral origins of replication or endogenous expression elements and a selectable 
5 marker gene. Following the introduction of the vector, cells may be allowed to grow for 1-2 days in an 
enriched media before they are switched to selective media. The purpose of the selectable marker is 
to confer resistance to selection, and its presence allows growth and recovery of cells which 
successfully express the introduced sequences. Resistant clumps of stably transformed cells can be 
expanded using tissue culture techniques appropriate to the cell type. 
10 Any number of selection systems may be used to recover transformed cell lines. These 

include, but are not limited to, the herpes simplex virus thymidine kinase (Wigler., et aL, 1977, Cell 
11:223) and adenine phosphoribosyltransferase (Lowy, et aL, 1980, Cell 22:817) genes which can be 
employed in tk- or aprt- cells, respectively. Also, antimetabolite, antibiotic or herbicide resistance can 
be used as the basis for selection; for example, dhfr which confers resistance to methotrexate (Wigler 
15 et aL, 1980, Proc Natl Acad Sci 77:3567); npt, which confers resistance to the aminoglycosides 

neomycin and G-418 (Colbere-Garapin et aL, 1981., J Mol Biol., 150:1) and als or pat, which confer 
resistance to chlorsulfuron and phosphinotricin acetyltransferase, respectively (Murry, supra). 
Additional selectable genes have been described, for example, trpB, which allows cells to utilize indole 
in place of tryptophan, or hisD, which allows cells to utilize histinol in place of histidine (Hartman and 
20 Mulligan, 1988, Proc Natl Acad Sci 85:8047). Recently, the use of visible markers has gained 

popularity with such markers as anthocyanins, B glucuronidase and its substrate, GUS, and luciferase 
and its substrate, luciferin, being widely used not only to identify transformants, but also to quantify the 
amount of transient or stable protein expression attributable to a specific vector system (Rhodes et aL, 
1995, Methods Mol Biol 55:121). 

25 

3 . Identification of Transformants Containing the Polynucleotide Sequence 
Although the presence/absence of marker gene expression suggests that the gene of interest 
is also present, its presence and expression should be confirmed. For example, if the sequence 
encoding a foreign protein is inserted within a marker gene sequence, recombinant cells containing the 
30 sequence encoding the foreign protein can be identified by the absence of marker gene function. 

Alternatively, a marker gene can be placed in tandem with the sequence encoding the foreign protein 
under the control of a single promoter. Expression of the marker gene in response to induction or 
selection usually indicates expression of the tandem sequences as well. 
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Alternatively, host cells which contain the coding sequence for a protein of interest and 
express the protein of interest may be identified by a variety of procedures known to those of skill in 
the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridization and 
protein bioassay or immunoassay techniques which include membrane, solution, or chip based 
technologies for the detection and/or quantification of the nucleic acid or protein. 

The presence of the polynucleotide sequence encoding the protein of interest can be detected 
by DNA-DNA or DNA-RNA hybridization or amplification using probes, portions or fragments of the 
sequence encoding tire foreign protein of interest. 

A variety of protocols for detecting and measuring the expression of the foreign protein, using 
either polyclonal or monoclonal antibodies specific for the protein are known in the art. Examples 
include enzyme-linked immunosorbant assay (ELISA), radioimmunoassay (RIA) and fluorescent 
activated cell sorting (FACS). A two-site, monoclonal-based immunoassay utilizing monoclonal 
antibodies reactive to two non-interfering epitopes on the protein of interest is preferred, but a 
competitive binding assay may be employed. These and other assays are described in Hampton et al., 
1990, Serological Methods a Laboratory Manual, APS Presds, St Paul MN and Maddox., et aL, 1983, 
J Exp Med 158:1211. 

4 . Purification of the Protein of Interest 

Host cells transformed with a nucleotide sequence encoding a protein of interest may be 
cultured under conditions suitable for the expression and recovery of the encoded protein from cell 
culture. The protein produced by a recombinant cell maybe secreted or contained intracellularly 
depending on the sequence and/or the vector used. As will be understood by those of skill in the art, 
expression vectors cont aining a sequence encoding a protein of interest can be designed with signal 
sequences which direct secretion of tire protein of interest through a prokaryotic or eucaryotic cell 
membrane. Other recombinant constructions may join the sequence encoding the protein of interest to 
the nucleotide sequence encoding a polypeptide domain which will facilitate purification of soluble 
proteins (Kroll et aL, 1993, DNA Cell Biol, 12:441). 

The protein of interest may also be expressed as a recombinant protein with one or more 
additional polypeptide domains added to facilitate protein purification. Such purification facilitating 
domains include, but are not limited to, metal chelating peptides such as a histidine-tryptophan modules 
that allow purification on immobilized metals, protein a domains that allow purification on immobilized 
immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex 
Corp, Seattle WA). The inclusion of a cleavable linker sequences such as Factor XA or enterokinase 
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(Invitrogen, San Diego CA), between the purification domain and the protein of interest is useful for 
facilitating purification. One such expression vector provides for expression of a fusion protein 
comprising the sequence encoding a foreign protein and nucleic acid sequence encoding 6 histidine 
residues followed by thioredoxin and an enterokinase cleavage site. The histidine residues facilitate 
5 purification while the enterokinase cleavage site provides a means for purifying the foreign protein 
from the fusion protein. 

In addition to recombinant production, fragments of the protein of interest may be produced by 
direct peptide synthesis using solid-phase techniques (Stewart et al., 1969, Solid-Phase Peptide 
Synthesis, WH Freeman Co,. San Francisco; Merrifield, 1963, J Am Chem Soc, 85:2149). In vitro 
10 protein synthesis may be performed using manual techniques or by automation. Automated synthesis 
maybe achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer, 
Foster City CA) in accordance with the instructions provided by the manufacturer. Various fragments 
of a protein of interest may be chemically synthesized separately and combined using chemical 
methods to produce the full length molecule. 

15 

L Preparation of Antibodies 

Antibodies specific for the protein products of the candidate genes of the invention are useful 
for protein purification, for the diagnosis and treatment of various diseases (e.g osteoarthritis) and for 
drug screening and drug design methods useful for identifying and developing compounds to be used in 

20 the treatment of various diseases (e.g. osteoarthritis). By antibody, we include constructions using the 
binding (variable) region of such an antibody, and other antibody modifications. Thus, an antibody 
useful in the invention may comprise a whole antibody, an antibody fragment, a polyfunctional antibody 
aggregate, or in general a substance comprising one or more specific binding sites from an antibody. 
The antibody fragment may be a fragment such as an Fv, Fab or F(ab') 2 fragment or a derivative 

25 thereof, such as a single chain Fv fragment. The antibody or antibody fragment maybe non- 

recombinant, recombinant or humanized. The antibody maybe of an immunoglobulin isotype, e.g., IgG, 
lgM, and so forth. In addition, an aggregate, polymer, derivative and conjugate of an immunoglobulin 
or a fragment thereof can be used where appropriate. Neutralizing antibodies are especially useful 
according to the invention for diagnostics, therapeutics and methods of drug screening and drug 

30 design. 

Although a protein product (or fragment or oligopeptide thereof) of a candidate gene of the 
invention that is useful for the production of antibodies does not require biological activity, it must be 
antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of 
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at least five amino acids and preferably at least 10 amino acids. Preferably, they should be identical to 
a region of the natural protein and may contain the entire amino acid sequence of a small, naturally 
occurring molecule. Short stretches of amino acids corresponding to the protein product of a candidate 
gene of the invention maybe fused with amino acids from another protein such as keyhole limpet 
5 hemocyanin or GST, and antibody will be produced against the chimeric molecule. Procedures well 
known in the art can be used for the production of antibodies to the protein products of the candidate 
genes of the invention. 

For the production of antibodies, various hosts including goats, rabbits, rats, mice etc... maybe 
immunized by injection with the protein products (or any portion, fragment, or oligonucleotide thereof 

10 which retains immunogenic properties) of the candidate genes of the invention. Depending on the host 
species, various adjuvants maybe used to increase the immunological response. Such adjuvants 
include but are not limited to Freund's, mineral gels such as aluminum hydroxide, and surface active 
substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet 
hemocyanin, and dinitrophenol. BCG (bacilli Calmette-Guerin) and Cory neh act erium parvum are 

1 5 potentially useful human adjuvants . 

1 . Polyclonal antibodies . 

The antigen protein may be conjugated to a conventional carrier in order to increase its 
immunogenicity, and an antiserum to the peptide-carrier conjugate will be raised. Coupling of a peptide 

20 to a carrier protein and immunizations maybe performed as described (D30iiecki et aL, 1992, J . Biol. 
Chem., 267: 4815). The serum can be titered against protein antigen by ELIS A (below) or 
alternatively by dot or spot blotting (Boersma and Van Leeuwen, 1994, J Neurosci. Methods, 51: 317). 
At the same time, the antiserum may be used in tissue sections prepared asdescribed. A useful serum 
will react strongly with the appropriate peptides by ELIS A, for example, following the procedures of 

25 Green et aL, 1982, Cell, 28: 477. 

2. Monoclonal antibodies. 

Techniques for preparing monoclonal antibodies are well known, and monoclonal antibodies 
30 may be prepared using a candidate antigen whose level is to be measured or which is to be either 
inactivated or affinity-purified, preferably bound to a carrier, as described by Arnheiter et aL, 1981, 
Nature, 294;278. 

Monoclonal antibodies are typically obtained from hybridoma tissue cultures or from ascites 
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fluid obtained from animals into which the hybridoma tissue was introduced. 

Monoclonal antibody-producing hybridomas (or polyclonal sera) can be screened for antibody 
binding to the target protein. 

5 3 . Antibody Detection Methods 

Particularly preferred immunological tests rely on the use of either monoclonal or polyclonal 
antibodies and include enzyme-linked immunoassays (ELISA), immunoblotting and 
immunoprecipitation (see Voller, 1978, Diagnostic Horizons, 2:1, Microbiological Associates Quarterly 
Publication, Walkersville, MD; Voller et aL, 1978, J. Clin. Pathol, 31: 507; U.S. Reissue Pat. No. 

10 31,006; UK Patent 2,019,408; Butler, 1981, Methods Enzymol., 73: 482; Maggio, E. (ed.), 1980, 
Enzyme Immunoassay, CRC Press, Boca Raton, EL) or radioimmunoassays (RIA) (Weintraub, B., 
Principles of radioimmunoassays, Seventh Training Course on Radioligand Assay Techniques, The 
Endocrine Society, March 1986, pp. 1-5, 46-49 and 68-78). For analysing tissues for the presence or 
absence of a protein produced by a candidate gene according to the present invention, 

15 imrnunohistochemistry techniques may be used. It will be apparent to one skilled in the art that the 
antibody molecule may have to be labelled to facilitate easy detection of a target protein. Techniques 
for labelling antibody molecules are well known to those skilled in the art (see Harlow and Lane, 1989, 
Antibodies, Cold Spring Harbor Laboratory). 

20 J. Preparation of a Labeled Protein 

1. Labling of protein 

Labeling techniques are useful, according to the invention, for studying the biochemical 
properties, processing, intracellular transport, secretion and degradation of proteins. 

Biosynthetic labeling of proteins produced by candidate genes of the invention is preferably 
25 performed with 35 S -methionine due to the high specific activity (>800Ci/mmol) and ease of detection 
of this amino acid. Another amino acid should be used to label a protein that contains little or no 
methionine. 

According to the following protocol, either suspension cells or adherent cells are labeled with 
35 S-methionine. Briefly, cells are washed and incubated for 15 min at 37°C in short-term labeling 
30 medium (complete serum-free, methionine free RPMI or DMEM containing 5% (v/v) dialyzed fetal 
bovine serum) to deplete intracellular pools of methionine. Cells are then incubated in the presence of 
35 S~methionine working solution (0.1 to 0.2 mCi/ml in 37°C short-term labeling medium) such that 4ml 
of 35 S-methionine working solution is added per 2 x 10 7 suspension cells and 2 to 4 ml of 35 S- 
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methionine working solution is added per 100 mm dish of adherent cells (0,5-2 x 10 7 cells), for a period 
of 30 min to 3 hour in a humidified, 37°C, 5% C0 2 incubator. Upon completion of labeling, suspension 
cells are washed by centrifugation in ice-cold PBS. Following removal of labeling medium, adherent 
cells are washed with PBS, scraped and collected by centrifugation. Labeled cells are processed and 

5 analyzed by inmiuno affinity chromatography, immunoprecipitation and one- and two-dimensional gel 
electrophoresis (Ausubel et al., supra). 

If the protein of interest is synthesized at a relatively low rate or is in a steady state, it may be 
necessary to label cells for an extended period of time. When performing long-term bio synthetic 
labeling of cells, it is necessary to include unlabeled methionine in the medium to maintain cell viability 

10 and to ensure that incorporation of label is maintained during the course of the experiment. According 
to this method, cells can be labeled in the presence of 35 S -methionine in long term labeling medium 
(90% methionine free RPMI or DMEM) for up to 16 hours (Ausubel et al, supra). 

2. In vitro Translation 

15 The protein product of the cloned candidate gene of the invention can be produced by the 

methods of in vitro transcription and in vitro translation. In vitro transcription is performed essentially 
as described in Section B entitled "Production of a Polynucleotide Sequence" in the absence of a 
labeled ribonucleoside. The RNA produced by the in vitro transcription reaction will be extracted with 
phenol, ethanol precipitated twice and resuspended in 10ml of TE buffer. In vitro translation is 

20 performed by adding 1 to 10ml of RNA to an in vitro translation kit (e.g. wheat germ or reticulocyte 
lysate) in the presence of 15mCi [ 35 S]methiouine, following the directions provided by the 
manufacturer. A typical reaction is carried out in a 30ml volume at room temperature for 30 to 60 
minutes (Ausubel et al., supra). 

25 

K. Production of Cells Expressing a Nucleotide Sequence Comprising a Polymorphism 

Mammalian cells expressing a nucleotide sequence comprising a polymorphism are useful, 
according to the invention for determining the biochemical and functional properties of the protein 
product of a nucleotide sequence comprising a polymorphism, for analyzing expression of a candidate 
30 gene, for large scale production of a protein of interest, for drug screening and for the production of 
transgenic animals or knockout mice. 

Methods of efficiently introducing foreign DNA into mammalian cells are known in the art and 
include calcium phosphate transfection, DEAE-dextran transfection, electroporation and liposome- 
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mediated transfection (Ausubel et al., supra). 

Transfection Protocols 

1 . Calcium-Phosphate Transfection 

5 The method of calcium phosphate transfection involves preparing a precipitate by slowly 

mixing a HEPES -buffered saline solution with a mixture of calcium chloride and DNA. According to 
this method, up to 10% of the cells on a dish will incorporate DNA. 

Cells to be transfected are split one day prior to transfection so that on the day of transfection 
cells are well-separated on the plate, a 10 cm dish of cells is fed with 9.0 ml of complete medium 

10 approximately 2 to 4 hours before the addition of the precipitate. DNA to be transfected (10-50mg/10- 
cm plate) is ethanol precipitated, resuspended in 450 ml sterile water and mixed with 50 ml of 2.5 M 
CaCl 2 The DNA/CaCl 2 solution is added dropwise to a 15-ml conical tube containing 500 ml 2X 
HeBS (0.283M NaCl, 0.023M HEPES acid, 1.5 mM Na 2 HP0 4 , pH 7.05). It is preferable to bubble 
the HeBS solution during the addition of the DNA mixture. After the precipitate has formed for 20 

15 minutes at room temperature, it is added evenly to the cells. The cells are incubated with the 

precipitate at 37°C in a C0 2 humidified incubator for 4-16 hours. Following removal of the precipitate, 
the cells are washed with PBS and fed in complete medium. Glycerol or dimethyl sulfoxide shock can ' 
be used to increase the DNA uptake by certain types of cells (Ausubel et al., supra). 

20 2. DEAE-Dextran Transfection 

Cells to be transfected are plated at a concentration such that after 3 days of growth they are 
30-50% confluent. The DNA to be transfected (approximately 4 mg) is ethanol precipitated, 
resuspended in 40ml TBS and added slowly while shaking to 80 ml of warm 10 mg/ml DEAE-dextran 
in TBS. After cells have been washed with PBS and fed with 4 ml of DMEM containing 10% Nu 

25 Serum/lOcm dish, the DEAE-dextranlDNA mixture is evenly distributed over the entire plate. Cells 
are incubated with the DNA for approximately 4 hours in a humidified C0 2 incubator. Following the 
removal of the DEAE-dextran/DNA mixture, cells are shocked by the addition of 5 ml of 10% DMSO 
in PBS. After a 1 minute incubation at room temperature, cells are washed with PBS and fed with 
complete medium (Ausubel et al., supra). 

30 

3. Electroporation 

Alternatively, DNA can be introduced into cells by the use of high-voltage electric shocks, a 
technique termed electroporation. Briefly, according to the method of electroporation, cells are 
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suspended in an appropriate electroporation buffer and placed in an electroporation cuvette. Following 
the addition of DNA, the cuvette is connected to a power supply and the cells are subjected to a high- 
voltage electrical pulse of a defined magnitude and length, optimized for the cell type being j 
transfected. After a brief period of recovery, the cells are placed in normal culture medium. 

5 A population of cells to be transfected by electroporation is grown to late-log phase in 

complete medium. Typically stable transfection requires 5 X 106 cells, and transient transfection 
requires 1-4 X 10 7 cells. Cells are harvested by centrifugation for 5 minutes at 640 x g at 4°C. The 
resulting cell pellet is resuspended in half of the original volume of ice-cold electroporation buffer (e.g. 
PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or 

10 phosphate buffered sucrose (272mM sucrose/7 mM K 2 HP0 4 , pH 7.4/lmM MgCl 2 )). The choice of 
an electroporation buffer is dictated by the cell line. Cells are then harvested by centrifugation for 5 
minutes at 640 x g at 4°C, and resuspended at 1 X 10 7 /ml in electroporation buffer at 0°C for stable 
transfection or at a higher concentration (up to 8 X 10 7 /ml) for transient transfection. Aliquots of the 
cells (0.5 ml) are transferred into the desired number of electroporation cuvettes and placed on ice. 

15 DNA is added to the cell suspension in the cuvettes on ice. For stable transfection, DNA 

(optimally 1-10 mg) should be linearized with a restriction enzyme that cuts at a site in a non-essential 
region, purified by phenol extraction and ethanol precipitated. Supercoiled DNA (optimally 10 mg) may 
be used for transient transfection. The DNA/cell suspension is mixed, and incubated on ice for 5 
minutes. 

20 The cuvette is placed in the holder in the electroporation apparatus (at room temperature) and 

shocked one or more times at the desired voltage and capacitance settings. An electroporation 
apparatus useful according to the invention is the Bio-Rad Gene Pulser. The number of shocks and the 
voltage and capacitance settings will vary depending on the cell type, and should be optimized. The 
two parameters that are critical for successful electroporation are the maximum voltage for the shock 

25 and the duration of the current pulse. 

Following electroporation, the cuvette containing the mixture of cells and DNA is incubated on 
ice for 10 minutes. The transfected cells are diluted 20-fold in complete culture medium. For stable 
transfection cells are grown for 48 hours in nonselective medium and then transferred to antibiotic 
containing medium. For transient transfection, cells are incubated 50-60 hours and then harvested for 

30 the desired transient assay. 

L. Production of Animals Expressing a Nucleotide Sequence Comprising a 
Polymorphism 
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Transgenic animals expressing a construct comprising a candidate gene containing a 
polymorphism, according to the invention can be produced by methods well known in the art (reviewed 
in Reeves et aL, supra). Knock out mice wherein a candidate gene according to the invention has been 
disrupted can be produced by methods well known in the art (reviewed in Moreadith and Radford, 
5 1997, J,Mol. Med., 75:208 and Shastry, 1998, Mol. Cell. Biochem., 181:163). These animals provide 
useful models for studying the functional consequences of one or more polymorphisms in a gene of 
interest. 

M. Production of a Candidate Gene Library 

10 The invention provides a method of producing a candidate gene library comprising genes that 

are potentially associated with the susceptibility to, or pathogenesis of a disease. A candidate gene 
library is useful for determining the genetic basis of a disease of interest. 

Genetic susceptibility to a disease must occur as a result of specific DNA differences relative 
to non-susceptible individuals. In the case of osteoarthritis, many genes are known which are 

15 potentially involved in the susceptibility to, or pathogenesis of the disease. These genes are included in 
the candidate gene library and the association of these genes with osteoarthritis is determined from 
population studies according to the invention. Unlike linkage studies , wherein a region of the genome/ 
that is. thought to be involved in a disease is determined, the candidate gene strategy, including 
association studies, addresses the involvement of a particular gene in a disease. The results of 

20 association studies of candidate genes are used to identify genes that should be intensively studied as 
potential therapeutics or therapeutic targets. 

According to the invention, the full range of polymorphic sites within each candidate gene is 
identified and examined in diseased and normal populations. The frequency of each gene variant 
(allele) in each population is then compared to the other. If a specific polymorphism under analysis 

25 contributes to the disease phenotype, it will be present in the diseased population at a higher frequency 
than in the normal population. In addition, if the specific polymorphism under analysis does not itself 
contribute to the disease phenotype but resides elsewhere in, or is near to a gene containing a 
contributory polymorphism, a significant association may be seen with the polymorphic marker being 
tested. This is because the two markers are in linkage disequilibrium with each other due to their close 

30 proximity. 

1. Strategies for Identifying Genes Associated with a Disease 

There are a number of methods known in the art for the identification of genes involved in a 
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disease. These methods include familial linkage studies followed by positional cloning, differential gene 
expression studies on tissues, and population-based candidate gene association studies. Although 
positional cloning has proven to be useful for diseases resulting from a single mutation, this technique is 
not suitable for identifying genetic linkage in diseases where multiple genetic variants combine to 

5 create disease susceptibility. Furthermore, it has been demonstrated that the etiological basis of the 
majority of diseases comprises more than one gene. 

The goal of linkage studies is to determine the approximate position of disease genes by 
studying related individuals in families. According to linkage strategies, DNA markers that are 
randomly spaced throughout the genome, but are rarely located within genes, are tested for the 

10 frequency of their presence along with the particular disease phenotype. There is approximately a 
50% chance of an unlinked gene and marker gene co-localizing. If a particular marker is present at a 
signi fi cantly higher frequency than expected in disease individuals, this indicates that the marker is 
located in the vicinity of the disease gene. Usually the disease gene is delimited to a large region 
(containing tens to hundreds of genes). After a disease gene has been grossly mapped, this entire 

15 region must be extensively characterized to determine what genes are present in the region. Any gene 
that is identified according to this method becomes a candidate gene. 

Linkage studies have been used successfully to identify the genes responsible for certain 
genetic diseases originating from mutations in a single gene (monogenic diseases). However, most 
common human diseases are of polygenic origin wherein changes in multiple genes causes an 

20 increased susceptibility to or pathogenesis of a particular disease. Because the DNA changes 

associated with genes which contribute to polygenic diseases are common in the population, thereby 
diluting the contribution of a given region of the genome to the disease, it is difficult to perform linkage 
studies on diseases of polygenic origin. 

25 Linkage analysis 

A series of genetic crosses is performed in an animal model system of a particular defect that 
is characteristic of a disease of interest (e.g. osteoarthritis) between individuals having an observable 
mutant phenotype and normal individuals of a control strain. At least one disease- related loci is used 
as a marker in these crosses. Alternatively, linkage analysis ban be performed using chromosomal 
30 markers that do not comprise a disease related locus (described below). If non-random assortment of 
the mutant trait with a marker locus is observed, and if that non-random assortment is statistically 
significant (for example, if a Student's t test or ANOVA is applied to the results) the trait is linked to 
the marker locus . 



108 



WO 03/054166 



PCT/US02/41225 



Similarly, linkage analysis using an existing human or other mammalian pedigree may be 
performed. Pedigree analysis is a useful technique for identifying genes for which variant alleles may 
contribute to the risk, onset or progression of a disease in a family containing multiple individuals 
afflicted with a disease; according to this method, numerous genetic loci from affected and unaffected 
5 family members are compared. Non-random assortment of a given genetic marker between affected 
and unaffected family members relative to the distributions observed for other genetic loci indicates 
that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in 
physical proximity to another that does so. 

If a non-random assortment of the disease-related phenotype with a marker locus is observed, 
10 using either approach, this is indicative of an association between the gene underlying the defect and 
that locus. Because the strength of any conclusion drawn from linkage analysis is statistically-based, 
the accuracy of the results is thought to be proportional to the number of crosses or family members 
and genetic loci analyzed. 

15 Positional Cloning 

If linkage is confirmed it is preferable to perform a molecular analysis of the region in which 
the peak of linkage maps. The wide availability of yeast artificial chromosome (YAC) or bacterial 
artificial chromosome (BAG) libraries facilitates this analysis, a nucleic acid sequence specific for a 
region encompassing a gene which is determined to occupy a map location of a particular locus of 

20 interest is examined, and open reading frames are evaluated to determine their relationship with the 
observed phenotype. An initial evaluation may be performed with the assistance of a computer 
program, such as the PathCalling IM (CuraGen) biological pathway discovery platform. All or a subset 
of the open reading frames present in the region are then cloned (e.g., by PCR) from mutant animals 
or affected family members and from their healthy counterparts (either control animals or unaffected 

25 family members), and the sequences of these open reading frames are compared. If a mutation or 
other allelic variant is found to be linked to individuals displaying the disease phenotype (in a 
statistically-significant, non-random manner), it can be concluded that this mutation is associated with a 
disease phenotype. A nucleic acid fragment containing this gene can be labeled and used as a probe 
for in situ hybridization analysis of fixed chromosomes of the human or other mammal to determine 

30 precisely the physical location of the gene. Furthermore, a gene that has been mapped and isolated in 
this manner maybe useful as a candidate target for disease diagnosis and for drug targeting according 
to the invention (see below). 
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2. Identification of Genes to be Included in Candidate Gene Library 

A candidate gene library according to the invention will include i. genes that are involved in 
known or predicted disease pathways, ii. new genes that are identified by a relevant pattern of specific 
tissue or cell expression, iii. genes that map to genomic regions of known linkage, and iv. gene 
5 sequences (from sequence databases) that are homologs of the above referenced categories of 

potential candidate genes. The choice of potentially related genes to be selected from a database will 
depend on the percent identity as calculated by Fast DB and based upon mismatch penalty, gap 
penalty, gap size penalty and joining penalty. Figure 1 summarized 

Based on the physiological changes associated with a disease of interest, predictions can be 
10 made regarding a cell or tissue-type that would be expected to express high or low levels of candidate 
genes associated with a particular disease. For osteoarthritis, it is expected that muscle, adipose, 
pancreas or liver tissue or tissue comprising insulin secreting pancreatic b-cells, would be useful for 
identifying candidate genes according to the invention. 

Differences in the expression of known and unknown genes in normal and disease tissue can 
15 be determined by methods known in the art including Serial Analysis of Gene Expression (SAGE) 
(Velcuescu et al, 1995, Science, 270:484), subtractive hybridization/screening (described below), 
differential display (Ling.and Pardee, 1992, Science, 257:967) high-density microarray expression 
testing. 

The technique of SAGE allows for the rapid, detailed analysis of thousands of transcripts. 
20 SAGE depends on the following two principles. First, sufficient information is contained within a short 
nucleotide sequence (approximately 9-lObp), isolated from a defined location within a transcript, to 
uniquely identify a transcript. Second, the concatenation of short tags of sequence allows transcripts to 
be analyzed serially by sequencing multiple tags within a single clone. 

The method of SAGE is performed by synthesizing double-stranded cDNA from mRNA, 
25 cleaving the resulting cDNA with an anchoring restriction endonuclease that is expected to cleave 

most transcripts at least one time, and isolating the most 3 ' region of the cleaved cDNA by binding to 
streptavadin beads. This protocol allows for the identification of a unique site on a transcript that 
corresponds to the restriction site located closest to the polyA tail. Replicate samples of the most 3' 
region of the cDNA are ligated to one of two linker molecules that contain a type IIS restriction site 
30 for a tagging enzyme. The cleavage site for Type IIS restriction endonucleases is located at a defined 
distance up to 20 bp from the asymmetric recognition site. Linkers are designed such that upon 
cleavage of the ligation product with the tagging enzyme there is release of the linker and an attached 
short region of cDNA. 
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Following the creation of blunt ends, the two pools of released tags are ligated to each other 
and the resulting ligated product is used as a template for PCR amplification in the presence of 
primers that are specific for each linker. The PCR product is cleaved with the anchoring enzyme and 
amplification products, comprising two tags linked tail to tail, are isolated, concatenated by ligation, 

5 cloned and sequenced (Velescu et aL, supra). 

Differential display provides a method for separating and cloning individual mRNAs by PCR 
analysis. According to the method of differential display, oligonucleotide primers are selected wherein 
one primer is anchored to the polyadenylate tail of a subset of mRNA species and the other primer is 
short and of an arbitrary sequence such that it anneals at different positions relative to the first primer. 

10 The mRNA subpopulations that are identified with these primer pairs are subjected to reverse 

transcription, amplified and analyzed on a DNA sequencing gel. By using multiple sets of primers, a 
reproducible pattern of amplified cDNA fragments that demonstrate a requirement for the sequence 
specificity of either primer can be obtained (Liang and Pardee, supra). 

According to the method of high-density microarray expression testing, DNA sequences to be 

15 tested for expression are spotted onto a surface, usually at high-density to allow for the testing of 

many genes. The surface contain the DNA sequences is typically referred to as a 4 chip'. The spotted , 
DNA cam be either cDNA clones or oligonucleotides. RNA is prepared from the two cells or tissues > 
to be compared. The RNA from one cell/tissue will be labeled red and the RNA from the other 
cell/tissue will be labeled yellow. Both RNA preparations are hybridized to the DNA array. The ratio 

20 of red to yellow is indicative of the relative levels of expression between the two cells/tissues. 

3. Mapping a candidate gene 

Molecular and cytogenetic methods of mapping candidate genes are known in the art and are 
summarized below. Linkage analysis provides a method for identifying genes mapping to genomic 
25 regions of known linkage. 

Linkage analysis 

As described above, linkage analysis may be performed between an unmapped candidate 
gene and one or more of the disease-related loci or by analyzing the genetic linkage between the 
30 candidate gene and chromosomal markers which are not themselves linked to a disease-related locus, 
according to the same method. For the latter type of analysis it is preferable that the spacing of 
markers throughout the genome of the test organism is approximately one every cM or less. This 
spacing will ensure complete coverage of the genome and will facilitate accurate mapping. 
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Other methods for mapping a candidate gene are provided below. 
Syntenic similarity 

As a result of classical genetic studies and, more recently, multi-laboratory genomic 

5 sequencing collaborations such as the Human Genome Project and Mouse Genome Project, the 
human and mouse genomes have been extensively characterized. It is now known that there is a 
significant degree of co-linearity among human, mice and rats wherein there is conservation relative to 
one another among these several species in the chromosomal map positions of numerous genes and 
groups of genes. Examination of the human and/or mouse chromosomal maps in the regions 

10 comparable to those to which a particular loci of interest maps in the rat will yield candidate genes 
which may be responsible for the physiological changes associated with a disease of interest. The 
methods of radiation hybrid mapping or fluorescence in situ hybridization at low stringency to rat 
chromosomes using labeled fragments derived from die human or mouse genes can be used to 
confirm that genes present in these regions of the human and/or mouse are present in the regions of 

15 interest in the rat. 

Radiation hybrid (RET) mapping is a somatic cell hybrid technique that was developed to 
create high resolution, contiguous maps of mammalian chromosomes. The method is useful for v 
ordering DNA markers spanning millions of base pairs of DNA at a resolution not easily obtained by 
other mapping methods (Cox et al, 1990, Science, 250: 245; Burmeister et al, 1991, Genomics, 9:19; 

20 Warrington et al., 1992, Genomics, 13: 803; Abel et al., 1993, Genomics, 17:632). Radiation hybrid 
mapping facilitates the mapping of non-polymorphic DNA markers that cannot be used for meiotic 
mapping. 

According to the method of radiation hybrid mapping a lethal dose of X-irradiation is used to 
fragment the chromosomes of the donor cell line. Chromosome fragments from the donor cell line are 

25 then retained, in a non-selective manner, following cell fusion with a recipient cell line. The resulting 
hybrid clones are then analyzed for the presence or absence of specific donor chromosome markers. 
It is expected that markers that are further apart on a chromosome are more likely to be broken apart 
by radiation and to segregate independently in the RH cells than markers that are closer together. By 
performing a statistical analysis of the co-segregation of various loci in hybrid clones, it is possible to 

30 construct a map that provides information regarding the relative order and distance of markers (Cox et 
al, 1990, supra; Warrington et al, 1991, Genomics, 11: 701; Ceccherini et al, 1992, Proc. Natl Acad. 
Sci. USA, 89: 104). 
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Subtractive screening 

In view of the observation that only a subset of an organism's geiies are expressed in a given 
tissue, there is a high probability that transcripts which differ in expression between cells of the same 
tissue in a mutant and control animal are responsible for the observed mutant phenotype. 

5 According to the method of subtractive cloning, mRNA is isolated from a tissue of choice, 

wherein the tissue is obtained from two distinct organisms and wherein one organism displays a 
mutant phenotype with regard to a particular trait while the other is normal in that respect. Methods 
well known in the art are used to prepare cDNA from the mRNA derived from the organism. The 
mRNA template is then degraded, either by hydrolysis under alkaline conditions or by RNAase H- 

10 mediated cleavage, and the cDNA is returned to a buffer in which mRNA is stable, and mixed with a 
molar excess of mRNA prepared from the second organism under conditions of stringent 
hybridization. The mixture is then passed over a hydroxyapatite column, which binds double-stranded 
nucleic acids but allows single stranded nucleic acid molecules to pass through. Reverse transcripts 
derived from the first sample which do not hybridize to niRNA molecules derived from the second 

15 organism (in other words, reverse transcripts specific to the first tissue sample) are present in the 
flow-through fraction and are cloned into a vector to create a subtraction library. The reciprocal 
experiment (in which the cDNA is derived from the second mRNA preparation) is also carried out to v 
create a complete set of transcripts specific to the tissue samples derived from the two organisms. 
This procedure will provide transcripts that can be labeled and used as probes in in situ 

20 hybridization analysis of immobilized chromosomes. The method of subtractive screening therefore, 
yields both cloned genes as well as reagents useful for determining if the cloned genes co-localize with 
a loci of interest. If a particular gene is found to co-localize to a loci of interest, the genes maybe 
analyzed functionally (e.g., in a phenotypic rescue experiment, as described below or by the 
phenotypic assays described in Section F entitled "Identification and Characterization of 

25 Polymorphisms") Ultimately, these genes may be used as targets for drugs or disease diagnostic 
methods, or even as therapeutic nucleic acids. 

Mutagenic transposon mapping 

The selection of insertional events that lie within genes (e.g., within coding or regulatory 
30 sequences) is facilitated by the use of entrapment vectors, first described in bacteria (Casadaban and 
Cohen, 1979, Proc. Natl. Acad. Sci. U.S.A., 76: 4530; Casadaban et al., 1980, J Bacteriol, 143: 971). 
By employing animal models, entrapment vectors can be introduced into pluripotent ES cells in culture 
(for example, using electroporation or a retrovirus) and then passed into the germline via chimeras 
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(Gossler et al, 1989, Science, 244: 463; Skames, 1990, Biotechnology, 8:827). Alternatively, transgenic 
animals containing entrapment vectors maybe generated by standard oocyte injection protocols. 

These methods result in DNA integrations that are highly mutagenic because they interrupt 
the endogenous coding sequence. It is estimated that the frequency of obtaining a mutation in some 

5 gene of any in the genome using a promoter or gene trap is about 45%. For adetailed description of 
retroviral insertion mutagenesis see Methods EnzymoL, vol. 225, 1990. Genes which are expressed in 
a tissue of interest and for which a biochemical assay of a particular activity have been developed in 
animal models are most useful according to this method. Promoter or gene trap vectors often contain a 
reporter gene, e.g., lacZ, Cat or green fluorescent protein (GJp) that lacks its own upstream 

10 promoter and/or splice acceptor sequence. That is, promoter gene traps contain a reporter gene with a 
splice site but no promoter. If the vector integrates within a gene and is spliced into the gene product, 
then the reporter gene will be expressed. Enhancer traps contain a reporter gene and have a minimal 
promoter which requires the activity of an enhancer in order to function. If the vector integrates near 
an enhancer (whether in a gene or not), then the reporter gene will be expressed. Activation of the 

15 reporter gene can only occur when the vector is integrated within an active host gene and generates a 
fusion transcript with the host gene. The activity of a reporter gene provides an easy assay for 
determining if a vector has been integrated into an expressed gene. Methods for detecting reporter 
gene activity in transfected cells or tissues of a transgenic animal are well known in the art. 

The mutagenic vector may be mapped using standard cytogenetic techniques, such as in situ 

20 hybridization, wherein a labeled fragment comprising vector-specific sequence is used as a probe. Co- 
localization of the probe with a particular locus of interest indicates that the associated gene is a 
suitable candidate and should be subjected to further analysis. A gene that has been identified in this 
manner can be cloned as described. 

25 N. Diagnostic Indicators, Screens and Disease Symptoms 

In another embodiment of the invention, there is provided a method of diagnosing or 
dete rmining susceptibility of a subject to joint space narrowing and/or osteophyte development 
and/or joint pain. This method involves analyzing the genetic material of a subject to determine 
which allele(s) of a gene is/are present. The method may include detemnning whether one or more 
30 particular alleles are present, or which combination of alleles (i.e. a haplotype) is present. The 
method may also include determining whether subjects are homozygous or heterozygous for a 
particular allele or haplotype. 

In a preferred embodiment, the method comprises determining which allele of one or more 
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polymorphisms of the invention is/are present, hi particular, the method may include determining the 
presence of a polymorphism of a gene which in combination with polymorphisms defined herein or 
other polymorphisms may define a risk haplotype. The polynucleotides sequences for these 
particular alleles may be used for diagnostic ptuposes. The polynucleotides which may be used 

5 include oligonucleotides, complementary RNA and DNA molecules and PNAs. The 

polynucleotides may be used to determine whether subjects are homozygous or heterozygous for a 
particular allele or haplotype making them susceptible to joint space narrowing and/or osteophyte 
development and/or joint pain, and hence, osteoarthritis. 

In one aspect, hybridization with a PCR probe which is capable of detecting a particular 

10 polymorphism may be used to identify nucleic acid sequences of particular alleles or haplotype. 
These probes must be specific to these particular alleles and the stringency of the hybridization or 
amplification must be such that the probe identifies only this particular allele. 

Means for producing specific hybridization probes for these polynucleotides of particular 
alleles include the cloning of these polynucleotide sequences into vectors for the production of 

15 mRNA probes is well known to one skilled in the art. Such vectors are known in the art, are 
commercially available, and may be used to synthesize RNA probes in vitro by means of the 
addition of the appropriate RNA polymerases and the appropriate labeled nucleotides. 
Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides 
such as 32 P or 35 S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via 

20 avidin/biotin coupling systems, and the like. 

Polynucleotides of particular alleles or haplotype may be used in Southern or northern 
analysis, dot blot, or other membrane-based technologies; in PCR technologies; in dipstick, pin, and 
multiformat ELISA-like assays; and in micro arrays utilizing fluids or tissues from patients to detect 
susceptibility to joint space narrowing and/or osteophyte development and/or joint pain. Such 

25 qualitative methods are well known in the art. 

In a particular embodiment, polynucleotides of particular alleles or haplotype may be used in 
assays that detect susceptibility to joint space narrowing and/or osteophyte development and/or 
joint pain, particularly those mentioned above. Polynucleotides complementary to sequences of a 
particular allele or haplotype may be labeled by standard methods and added to a fluid or tissue 

30 sample from a patient under conditions suitable for the formation of hybridization complexes. After 



115 



WO 03/054166 



PCT/US02/41225 



a suitable incubation period, the sample is washed and it is determined if there is a signal. If a signal 
is found, then the presence of the polynucleotide of a particular allele, alleles or haplotype in the 
sample indicates the susceptibility to joint space narrowing and/or osteophyte development and/or 
joint pain, and hence, osteoarthritis. Such assays may also be used to determine the particular 

5 therapeutic treatment regimen for an individual patient. 

With respect to osteoarthritis, the presence of a particular polymorphism or polymorphisms 
in a tissue sample from an individual may indicate a predisposition for joint space narrowing and/or 
osteophyte development and/or joint pain, or may provide a means for detecting osteoarthritis prior 
to the appearance of actual clinical symptoms. A more definitive diagnosis of this type may allow 

10 health professionals to employ preventative measures or aggressive treatment earlier, thereby 
preventing the development or further progression of osteoarthritis. 

Additional diagnostic uses for oligonucleotides designed from the polynucleotide sequences 
of a particular allele or haplotype may involve the use of PCR. These oligomers may be chemically 
synthesized, generated enzymatically, or produced in vitro. Oligomers will contain a fragment of a 

15 polynucleotide a particular allele, alleles or haplotype or a fragment of a polynucleotide 

complementary to the polynucleotide a particular allele, alleles or haplotype, and will be employed 
under optimized conditions for identification of a specific polymorphism, polymorphisms or 
haplotype. Oligomers may also be employed under very stringent conditions for detection of these 
particular DNA or RNA sequences. 

20 In further embodiments, oligonucleotides or longer fragments derived from any of the 

polynucleotides described herein may be used as elements on a micro array. The micro array can be 
used in transcript imaging techniques to detect a particular polymorphism, polymorphisms or 
haplotype simultaneously as described below. In particular, this information may be used to develop 
a pharmacogenomic profile of a patient in order to select the most appropriate and effective 

25 treatment regimen for that patient. For example, therapeutic agents which are highly effective and 
display the fewest side effects may be selected for a patient based on his/her pharmacogenomic 
profile. 

Microarrays may be prepared, used, and analyzed using methods known in the art 
(Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. 
30 Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/25 11 16; 
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Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al (1997) Proc. Natl. 
Acad. Sci. USA 94:2150-2155; Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662). Various 
types of micro arrays are well known and thoroughly described in Schena, M., ed. (1999; DNA 
Micro arrays: A Practical Approach . Oxford University Press, London). 
5 In another embodiment, a method involves the use of antibodies in diagnosing or determining 

the susceptibility to joint space narrowing and/or osteophyte development and/or joint pain. The 
antibodies would specifically bind to an epitope of a particular allele or form of the protein and may 
be used to determine susceptibility to joint space narrowing and/or osteophyte development and/or 
joint pain, and hence, osteoarthritis. Antibodies useful for diagnostic purposes may be prepared in 

10 the same manner as described above. Diagnostic assays for determining susceptibility to joint space 
narrowing and/or osteophyte development and/or joint pain include methods which utilize the 
antibody and a label to detect a particular allele or form of the protein in human body fluids or in 
extracts of cells or tissues. The antibodies may be used with or without modification, and may be 
labeled by covalent or non-covalent attachment of a reporter molecule. A wide variety of reporter 

15 molecules are known in the art and may be used. 

A variety of protocols for measuring a particular allele or form of the protein, including 
ELISAs, RIAs, and FACS, are known in the art and provide a basis for diagnosing susceptibility to 
joint space narrowing and/or osteophyte development and/or joint pain. 

20 O. Preparation of a Human Sample 

The presence of an allelic form, of a gene containing a sequence variation, according to the 
invention, can be detected by testing any tissue of a human subject. Human samples that are useful 
according to the invention include tissue or fluid samples containing a polynucleotide or polypeptide of 
interest, include but are not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external 
25 secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, 
tissue and samples of in vitro cell culture constituents. Genomic DNA, cDNA or RNA can be 
prepared from the human sample according to the methods described above. 



P. Methods of Use 

30 1. Nucleic Acid Diagnosis and Diagnostic Kits 

In order to detect the presence of an allele of a gene predisposing an individual to 
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osteoarthritis, a biological sample such as blood is prepared and analyzed for the presence or absence 
of susceptibility alleles of a gene containing a polymorphism, according to the invention. Results of. 
these tests and interpretive information will be returned to the health care provider for communication 
to the tested individual. Such, diagnoses maybe performed by diagnostic laboratories, or, alternatively, 
5 diagnostic kits are manufactured and sold to health care providers or to private individuals for self- 
diagnosis. 

Initially, the screening method will involve amplification of the relevant gene sequences. In 
another preferred embodiment of the invention, the screening method involves a non-PCR based 
strategy. Such non-PCR based screening methods include Southern blot analysis to detect the 

10 presence of a variant form of a gene in a sample comprising total genomic DNA from the individual 
being tested. Alternatively, northern blot analysis can be used to detect an aberrant mRNA encoded 
by a gene, that exhibits altered stability or is the result of alternative splicing in a sample comprising 
RNA from an individual being tested. The methods of SI nuclease analysis, RNase protection and 
primer extension can also be used to determine both the endpoint and the amount of a gene specific 

15 mRNA (Ausubel et al., supra). Both PCR and non-PCR based screening strategies can detect target 
sequences with a high level of sensitivity. 

The preferred method, according to the invention, is target amplification. According to this 
method, the target nucleic acid sequence is amplified with polymerases. One particularly preferred 
method using polymerase-driven amplification is PCR (described above). The polymerase chain 

20 reaction and other polymerase-driven amplification assays can achieve over a million-fold increase in 
copy number through the use of polymerase-driven amplification cycles. PCR primers useful for target 
amplification according to the invention, will be designed to amplify a region of DNA containing one or 
more polymorphisms. Allele specific primers (comprising one or more polymorphisms) are also useful 
for detecting gene sequence variations by PCR methodologies according to the invention. The absence 

25 of a particular polymorphism will be indicated by the absence of an amplified product when the 

amplification step is carried out in the presence of allele specific primers. Once amplified, the resulting 
nucleic acid can be sequenced and the specific sequence of the test DNA will be compared with the 
wild type sequence by using the computer programs described in Section F entitled "Identification and 
Characterization of Polymorphisms". Alternatively, the amplified product will be analyzed by Southern 

30 blot assay with nucleic acid probes. Nucleic acid probes, useful according to the invention, will be 
specifically hybridizable to a mutant form of a gene but not to the wild type gene due to the presence 
of one or more polymorphisms. 

When a probe comprising the target sequence, according to the invention, is used to detect the 
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presence of the target sequences via non PCR-based strategies, (for example, in screening for 
osteoarthritis susceptibility), the biological sample to be analyzed, such as blood or serum, maybe 
treated, if desired, to extract the nucleic acids (as described above). The sample nucleic acids (isolated 
from a biological sample or amplified by PCR) may be prepared in various ways to facilitate detection 

5 of the target sequence; e.g. denaturation, restriction digestion, electrophoresis or dot blotting. 

Preferably, the targeted region of the nucleic acids being analyzed are at least partially single-stranded 
to form hybrids with the targeting sequence of the probe. If the sequence is naturally single-stranded, 
denaturation will not be required. However, if the sequence is double-stranded, the sequence will 
probably need to be denatured. Denaturation can be carried out by various techniques known in the 

10 art. 

To detect the presence of a sequence variation in a gene, according to the invention, analyte 
nucleic acid and probe will be incubated under conditions which promote stable hybrid formation of the 
target sequence in the probe with the putative targeted sequence in the sample DNA. If the region of 
the probe which is used to bind to the analyte is designed to be completely complementary to the 

15 targeted region, high stringency conditions are desirable in order to prevent false positives. However, 
conditions of high stringency will be used only if the probes are complementary to regions of the 
chromosome which are unique, in the genome. The stringency of hybridization is determined by a ^ 
number of factors (described above). Detection, if any, of the resulting hybrid is usually accomplished 
by the use of labeled probes. Alternatively, the probe may be unlabeled, but may be detectable by 

20 specific binding with a ligand which is labeled, either directly or indirectly. Suitable labels, and methods 
for labeling probes and ligand are known in the art, and are described in Section C entitled "Production 
of a Nucleic Acid Probe". 

• Accordingly, the foregoing screening method may be modified to identify individuals having a 
gene containing a neutral polymorphism not associated with osteoarthritis, by preferably amplifying 

25 DNA fragments of a gene derived from a particular individual. The amplified DNA fragments are 
sequenced and the sequence is compared to the consensus gene sequence containing neutral 
polymorphisms. At this time, differences between the individual's coding sequence for a gene and a 
consensus sequence for the same gene are determined wherein the presence of any neutral 
polymorphisms and the absence of a polymorphisms not previously identified as neutral polymorphisms 

30 can be correlated with an absence of increased genetic susceptibility to osteoarthritis resulting from a 
mutation in a gene coding sequence. 

In another embodiment of the invention, detection of a polymorphism will be performed by 
detecting loss of a restriction enzyme recognition site due to the presence of one or more 
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polymorphisms. According to this embodiment, a polymorphism will be detected with a polynucleotide 
probe that is capable of detecting a restriction enzyme fragment containing the polymorphism, wherein 
the fragment is of a size that can be easily separated on an agarose gel and visualized by Southern blot 
analysis. A polynucleotide probe according to this embodiment of the invention can be specific for a 
5 sequence within the candidate gene or outside of the candidate gene. 

It is also contemplated within the scope of this invention that the nucleic acid probe assays of 
this invention will employ a mixture of nucleic acid probes capable of detecting a gene. Thus, in one 
example to detect the presence of a gene in a test sample, more than one probe complementary to a 
gene is employed and in particular the number of different probes is alternatively 2, 3, or 5 different 

10 nucleic acid probe sequences. In another example, to detect the presence of mutations in the gene 
sequence in a patient, more than one probe complementary to a gene is employed wherein the probe 
mixture includes probes capable of binding to the allele- specific mutations identified in populations of 
patients with alterations in a gene. In this embodiment, any number of probes can be used, and will 
preferably include probes corresponding to the major gene mutations identified as predisposing an 

15 individual to osteoarthritis. 

Northern blot analysis, SI nuclease analysis, RNase protection and primer extension (Ausubel 
et al., supra) are also methods according to the invention for detecting changes in mRNA resulting 
from the presence of one or more polymorphisms in the sequence of a gene. 

Additionally, of the methods of genotyping described in Section F entitled "Identification and 

20 Characterization of Polymorphisms" can be used for diagnostics according to the invention. 



2. Peptide Diagnosis and Diagnostic Kits 

Osteoarthritis can also be detected on the basis of an alteration of the wild-type polypeptide. 
Such alterations can be determined by sequence analysis in accordance with conventional techniques. 

25 More preferably, antibodies (polyclonal or monoclonal) are used to detect differences in, or the 
absence of peptides derived from a gene of interest. The antibodies may be prepared as described 
above in Section I entitled "Preparation of Antibodies". Preferably, antibodies will immunoprecipitate 
the protein product of a gene from solution as well as react with the protein product of a gene on 
Western or immunoblots of polyacrylamide gels. Antibodies useful according to the invention will also 

30 detect the protein product of a gene in paraffin or frozen tissue sections, using immunocytochemical 
techniques. 

Preferred embodiments relating to methods for detecting wild type or mutant forms of the 
protein product of a gene include enzyme linked immunosorbent assays (ELISA), radioimmunoassay 
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(RIA), immunoradiometric assays (IRMA) and immuno enzymatic assays (IBM A), including sandwich 
assays using monoclonal and/or polyclonal antibodies. Exemplary sandwich assays are described by 
David et al. In U.S. Pat. Nos. 4,376,110 and 4,486,530, hereby incorporated by reference. 

5 3 . Drug S creening 

This invention is particularly useful for screening therapeutic compounds by using the mutant 
gene or protein product or binding fragment of the gene in any of a variety of drug screening 
techniques. 

The protein product or fragment of a gene employed in such a test may either be free in 

10 solution, affixed to a solid support, expressed on the surface of a cell, or located intracellularly. One 
method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed 
with a recombinant polynucleotide expressing the polypeptide or fragment, preferably in competitive 
binding assays. Such cells, either in viable or fixed form, can be used for standard binding assays. In 
particular, these cells can be used to measure formation of a complex comprising the protein product 

15 or fragment of a gene and the agent being tested. Alternatively, these cells can be used to determine if 
the formation of a complex between the protein product or fragment of a gene and a known ligand is 
interfered with by an agent being tested. 

Thus, the present invention discloses methods useful for drug screening wherein such methods 
comprise Contacting a candidate drug with a polypeptide or fragment derived from a gene and 

20 assaying (i) for the presence of a complex between the drug and the polypeptide derived or fragment 
derived from a gene, or (ii) for the presence of a complex between the polypeptide or fragment 
derived from a gene and a ligand, by methods well known in the art. Preferably, the polypeptide or 
fragment derived from a gene is labeled for use in competitive binding assays. Methods for producing 
a labeled protein by in vitro translation are described in Section J entitled "Preparation of a Labeled 

25 Protein". Free polypeptide or fragment will be separated from that present in a proteinrprotein 

complex, and the amount of free (i.e., uncomplexed) label will be used as a measure of the binding of 
the test drug to the polypeptide or the ability of the test drug to interfere with proteinrligand binding. 

Another method of drug screening allows for high throughput screening for compounds 
exhibiting suitable binding affinity to the polypeptides and is described in detail in Geysen, WO 

30 84/03564. According to this method, large numbers of different small peptide test compounds are 
synthesized on a solid substrate, such as plastic pins or another suitable surface. The peptide 
test compounds are reacted with the polypeptides or peptide fragments derived from a gene, and 
washed. Bound polypeptide is then detected by methods well known in the art. 

121 



WO 03/054166 



PCT/US02/41225 



Purified protein can be coated directly onto plates for use in the aforementioned drug 
screening techniques. Alternatively, non-neutralizing antibodies to the polypeptide can be used to 
capture the polypeptide or peptide fragment of interest and immobilize it on the solid support. 

Competitive drug screening assays in which neutralizing antibodies capable of specifically 
5 binding the polypeptide of interest compete with a test compound for binding to the polypeptide or 
fragments thereof of interest are also useful according to the invention. According to this method, 
antibodies can be used to detect the presence of any test peptide which shares one or more antigenic 
determinants with the polypeptide of interest. 

An additional technique for drug screening involves the use of host eukaryotic cell lines or 

10 cells (such as described above) which have a gene that produces a defective protein. According to 

this method, the host cell lines or cells are grown in the presence of a test drug compound. The rate of 
growth of the host cells is measured to determine if the compound is capable of regulating the growth 
of cells expressing a nonfunctional protein product of a gene. Alternatively, the ability of the test 
compound to restore the function of the mutant gene protein can be measured by using an appropriate 

15 in vitro assay for function of the protein product of a gene. Suitable in vitro functional assays are 

described in Section F entitled "Identification and Characterization of Polymorphisms". If the host cell 
lines or cells express a protein product of a gene that exhibits an aberrant pattern of cellular 
localization, the ability of the test compound to alter the cellular localization of the protein will be 
determined. Changes in the cellular localization of a protein of interest will be detected by performing 

20 cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of 
a protein of interest can be determined by itnmunocytochemical methods well known in the art. 

A method of drug screening may involve the use of host eukaryotic cell lines or cells 
(described above) which have an altered gene that demonstrates an aberrant pattern of expression. 
By aberrant pattern of expression is meant the level of expression is either abnormally high or low, or 

25 the temporal pattern of expression is different from that of the wild type gene. The ability of a test 

drug to alter the expression of a mutant form of a gene can be measured by Northern blot analysis, S 1 
nuclease analysis, primer extension or RNase protection assays. Alternatively, if a mutant form of a 
gene contains an polymorphisms in the promoter region of a gene, cells can be engineered to express a 
reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g. 

30 CAT, luciferase, green fluorescent protein). These cells can be grown in the presence of a test 

compound and the ability of a test compound to alter the level of activity of the mutant gene promoter 
can be determined by standard assays for each reporter gene which are well known in the art. 
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Candidate Drug s 

A "candidate drug" as used herein, is any compound with a potential to modulate a phenotype 
associated with a particular disease according to the invention. 

A candidate drug is tested in a concentration range that depends upon the molecular weight of 
5 the drug and the type of assay. For example, for inhibition of protein/protein complex formation, small 
molecules (as defined below) may be tested in a concentration range of 1 pg - 100 mg/ml, preferably at 
about 100 pg - 10 ng/ml; large molecules, e.g., peptides, may be tested in the range of 10 ng - 100 
mg/ml, preferably 100 ng - 10 mg/ml. 

Candidate drug compounds from large libraries of synthetic or natural compounds can be 
10 screened. Numerous means are currently used for random and directed synthesis of saccharide, 

peptide, and nucleic acid based compounds. Synthetic compound libraries are commercially available 
from a number of companies including Maybridge Chemical Co. (Trevillet, Cornwall, UK), Comgenex 
(Princeton, NJ), Brandon Associates (Merrimack, NH), and Microsource (New Milford, CT). A rare 
chemical library is available from Aldrich (Milwaukee, WI). Combinatorial libraries are available and 
15 can be prepared. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant 
and animal extracts are available from e.g., Pan Laboratories (Bothell, WA) or MycoSearch (NC), or 
are readily produceable by methods well known in the art. Additionally, natural and synthetically 
produced libraries and compounds are readily modified through conventional chemical, physical, and 
biochemical means. 

20 Useful compounds may be found within numerous chemical classes, though typically they are 

organic compounds, and preferably small organic compounds. Small organic compounds have a 
molecular weight of more than 50 yet less than about 2,500 daltons, preferably less than about 750 
daltons, more preferably less than about 350 daltons. Exemplary classes include heterocycles, 
peptides, saccharides, steroids, and the like. The compounds maybe modified to enhance efficacy, 

25 stability, pharmaceutical compatibility, and the like. Structural identification of an agent may be used to 
identify, generate, or screen additional agents. For example, where peptide agents are identified, they 
maybe modified in a variety of ways to enhance their stability, such as using an unnatural amino acid, 
such as a D- amino acid, particularly D-alanine, by functionalizing the amino or carboxylic terminus, 
e.g. for the amino group, acylation or alkylation, and for the carboxyl group, esterification or 

30 amidification, or the like. 

Determination of Activity of a Drug 

A candidate drug, assayed according to the invention as described above, is determined to be 
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effective if its use results in a change of about 10% of a phenotype associated with a disease 
according to the invention. 

The level of modulation by a candidate modulator of a phenotype associated with a disease 
according to the invention, may be quantified using any acceptable limits, for example, via the 
5 following formula, which describes detections performed with a radioactively labeled probe (e.g., a 
radiolabeled antibody in an immunobinding experiment or a radiolabeled nucleic acid probe in a 
Northern hybridization). 

(CPM Control -CPM Sample ) 

10 Percent Modulation = xlOO 

(CPM Control ) 

where CPM Control is the average of the cpm in antibody/ligand complexes or on Northern blots 
resulting from assays that lack the candidate modulator (in other words, untreated controls), and 
15 CPM Sample is the cpm in antibody/ligand complexes or on Northern blots resulting from assays 

containing the candidate modulator. A similar calculation is performed where the assay comprises use 
of a labeling system or system of measuring enzymatic activity in which there is a linear relationship 
between the amount of label detected and the amount of protein or nucleic acid being represented per 
unit of label or the amount of protein or nucleic acid represented by a unit of enzymatic activity. 

20 

4 . Rational Drug Design 

Rational drug design is useful for producing either structural analogs of biologically active 
polypeptides of interest or small molecules with which polypeptides of interest interact (e.g., agonists, 
antagonists, inhibitors) in order to design drugs winch are, for example, more active or stable forms of 

25 the polypeptide, or which enhance or interfere with the function of a polypeptide in vivo. See, e.g., 
Hodgson, 1991, BioTechnology, 9:19. According to one method of rational drug design, the three- 
dimensional structure of a protein of interest (e.g., the polypeptide product of the gene) or, or the 
complex comprising the protein product of a gene in association with its ligand, is determined by x-ray 
crystallography, by computer modeling or most typically, by a combination of approaches. 

30 Alternatively, useful information regarding the structure of a polypeptide may be obtained by modeling 
based on the structure of homologous proteins. Rational drug design has been used successfully in the 
development of HIV protease inhibitors (Erickson et al., 1990, Science, 249: 527). 

Rational drug design may also involve the analysis of peptides derived from the protein 
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product of a gene by an alanine scan (Wells, 1991, Methods in EnzymoL, 202: 390). According to this 
method, each of the amino acid residues of the peptide is sequentially replaced by alanine, and the 
effect of this amino acid substitution on the peptide's activity is determined. This technique can be 
used to determine the functionally relevant regions of the peptide. 

5 Another experimental approach to rational drug design will involve the isolation of a target- 

specific antibody (selected by a functional assay) and the determination of the crystal structure of this 
antibody. Theoretically, this approach will yield a pharmacore upon which subsequent drug design can 
be based. Alternatively, if anti-idiotypic antibodies (anti-ids) specific for a functional, 
pharmacologically active antibody are generated, there is no need to determine the crystallographic 

10 structure of the target-specific antibody. It is expected that the binding site of the anti-ids will be an 
analog of the original receptor. The anti-id could then be used to identify and isolate potentially 
therapeutic peptides from banks of chemically or biologically produced banks of peptides. These 
selected peptides would then function as pharmacores. 

According to these methods it may be possible to design drugs which demonstrate increased 

15 activity or stability of the protein product of a gene or which function as inhibitors, agonists, 
antagonists, etc. of the activity of a protein product of a gene. The availability of cloned gene 
sequences, including polymorphisms, ensures that sufficient amounts of the polypeptide product of a 
gene are available to facilitate analytical studies such as x-ray crystallography. Furthermore, the 
knowledge of the sequence of the protein product of a gene provided herein will guide those using 

20 computer modeling techniques in place of, or in addition to x-ray crystallography. 

5. Gene Therapy 

The present invention also provides a method of supplying wild-type gene function to a cell 
which carries a mutant allele of a gene. By replacing a mutant gene with a wild type gene, it may be 

25 possible to reverse the symptoms of osteoarthritis in the recipient cells, a full length version of the 
wild-type gene, or a fragment of the gene, may be introduced into the cell in a vector such that the 
gene remains extrachromosomal and is expressed by the cell from the extrachromosomal location. 
More preferably, following introduction into the mutant cell, the wild-type gene or gene fragment 
should recombine with the endogenous mutant gene X already present in the cell. Such recombination 

30 requires a double recombination event which results in the correction of the gene mutation. Vectors 
for introduction of genes both for recombination and for extrachromosomal maintenance are known in 
the art, and any suitable vector may be used. Methods for introducing DNA into cells such as 
electroporation, calcium phosphate coprecipitation and lipofection are known in the art (described 
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above). Cells transformed with the wild-type gene can be used as model systems to study changes in 
the intensity of symptoms associated with osteoarthritis and drug treatments which promote such 
changes. 

As generally discussed above, a gene or a fragment thereof, where applicable, may be used in 
5 gene therapy methods in order to increase the amount of the expression products of such genes in 
cells of patients with osteoarthritis. It may also be useful to increase the level of expression of a gene 
even in those cells in which the mutant gene is expressed at a "normal' level, but the gene product is 
not fully functional. 

It other embodiments of the invention it may be useful to increase the amount of the 

10 expression products of a mutant form of a gene in a cell that expresses the wild type protein. Gene 
therapy can be carried out according to generally accepted methods, for example, as described by 
Friedman, 1991, In Therapy for Genetic Diseases; T. Friedman ed., Oxford University Press, pp. 105- 
121). Initially, the appropriate cells from a patient with osteoarthritis would be analyzed by the 
diagnostic methods described above, to determine the level of production of a polypeptide from a gene 

15 and the activity of a polypeptide product of a gene. A virus or plasmid vector (see further details 
below), comprising a copy of a gene and suitable expression control elements, and capable of 
replicating inside the cells, will be prepared. Suitable vectors are known and are disclosed in U.S. Pat. 
No. 5,252,479 and PCT published application WO 93/07282. The vector will be injected into the 
patient, either locally at an appropriate site according to the invention or systemically. 

20 Gene transfer systems known in the art may be useful in the practice of the gene therapy 

methods of the present invention. These include viral and nonviral transfer methods, a number of 
viruses have been used as gene transfer vectors, including papovaviruses, e.g., 5V40 (Madzak et al., 
1992, J Gen Virol, 73:1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol., 158:39; 
Berkner et al, 1988, BioTechniques, 6:616; Gorziglia and Kapikian, 1992, J Virol, 66:4407; Quantin et 

25 al, 1992, Proc. Natl. Acad. Sci. USA, 89:2581; Rosenfeld et al, 1992, Cell, 68:143 ; Wilkinson et al, 
1992, Nucleic Acids Res. 20:2233; Stratford-Perricaudet et al, 1990, Hum. Gene Ther., 1:241), 
vaccinia virus (Moss, 1992, Curr. Top. Microbiol. Immunol, 158:25) adeno-associated virus 
(Muzyczka, 1992, Curr. Top. Microbiol. Immunol, 158:97; Ohi et al, 1990, Gene, 89:279), 
herpesviruses including HSV and EBV (Margolskee, 1992, Curr. Top. Microbiol. Immunol, 158:67, 

30 Johnson et al, 1992, J. Virol, 66:2952; Fink et al, 1992, Hum. Gene Ther., 3:11; Breakfield and Geller, 
1987, Mol. Neurobiol, 1:337; Freese et al, 1990, Biochem. Pharmacol, 40: 2189), and retroviruses of 
avian (Brandyopadhyay and Temin, 1984, Mol. Cell. Biol, 4:749; Petropoulos et al, 1992, J. Virol, 
66:3391), marine (Miller, 1992, Curr. Top. Microbiol. Immunol, 158:1; Miller et al, 1985, Mol Cell. 
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Biol, 5:431; Sorge et aL, 1984, MoL Cell. Biol., 4:1730; Mann and Baltimore, 1985, J. Virol, 54:401; 
Miller et aL, 1988, J. Virol., 62:4337), and human origin (Shimada et aL, 1991, J. Clin. Invest., 88:1043); 
Helseith et aL, 1990, J. Virol., 64:24 16; Page et aL, 1990, J. Virol., 64: 5370; Buchschacher and 
Panganiban, 1992, J. Virol., 66:2731). Most human gene therapy protocols have been based on 
5 disabled murine retroviruses. 

Nonviral gene transfer methods known in the art include chemical techniques such as calcium 
phosphate coprecipitation (Graham and van der Eb, 1973, Virology, 52:456; Pellicer et aL, 1980, 
Science, 209:1414); mechanical techniques, for example microinjection (Anderson et aL, 1980, Proc. 
Natl. Acad. Sci. USA, 77: 5399; Gordon et aL, 1980, Proc. Natl. Acad. Sci.. USA, 77: 7380; Brinster 
10 et aL, 1981, Cell, 27:223; Constantini and Lacy, 1981, Nature, 294:92); membrane fusion-mediated 
transfer via liposomes (Feigner et aL, 1987, Proc. Natl. Acad. Sci. USA, 84:7413; Wang and Huang, 

1989, Biochemistry, 28:9508; Kaneda et al. 1989, J. Biol. Chem., 264:12126; Stewart et aL, 1992, 
Hum. Gen. Ther., 3:267; Nabel et aL, 1990, Science, 249:1285; Lim et aL, 1992, Circulation, 83:2007); 
and direct DNA uptake and receptor-mediated DNA transfer (Wolff et aL, 1990, Science, 247:1465; 

15 Wu et aL, 1991, J. Biol. Chem., 266:14338; Zenke et aL, 1990, Proc. Natl. Acad. Sci. USA, 87:3655; 
Wu et aL, 1989b, J. Biol. Chem., 264:16985; Wolff et aL, 1991, BioTechniques, 11:474; Wagner et aL, 

1990, Proc. Natl. Acad. ScLUSA, 87:3410; Wagner et aL, 1991, Proc. Natl. Acad, Sci.USA, 88:4255; 
Gotten et aL, 1990, Proc. Natl. Acad. Sci.USA, 87:4033; Curiel et aL, 1991a, Proc. Natl. Acad. 
Sci.USA, 88:8850; Curiel et aL, 1991b, Hum. Gene Ther., 3:147. 

20 In an approach which combines biological and physical gene transfer methods, plasmid DNA 

of any size is combined with a polylysine-conjugated antibody specific to the adenovirus hexon protein, 
and the resulting complex is bound to an adenovirus vector. The trimolecular complex is then used to 
infect cells. The adenovirus vector permits efficient binding, internalization, and degradation of the 
endosome before the coupled DNA is damaged. 

25 Liposome/DNA complexes have been shown to be capable of mediating direct in vivo gene 

transfer. While in standard liposome preparations the gene transfer process is nonspecific, localized in 
vivo uptake and expression have been reported in tumor deposits, for example, following direct in situ 
administration (Nabel, 1992, Hum. Gen. Ther., 3:399). 

Gene transfer techniques which target DNA directly to an appropriate tissue, e.g., a tissue 

30 that normally expresses the protein product of the candidate gene of the invention, is preferred. 

Receptor-mediated gene transfer, for example, is accomplished by the conjugation of DNA (usually in 
the form of covalently closed supercoiled plasmid) to a protein ligand via polylysine. Ligands are 
chosen on the basis of the presence of the corresponding ligand receptors on the cell surface of the 
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target cell/tissue type. These ligand-DNA conjugates can be injected directly into the blood if desired 
and are directed to the target tissue where receptor binding and internalization of the DNA-protein 
complex occurs. To overcome the problem of intracellular destruction of DNA, coinfection with 
adenovirus can be included to disrupt endosome function. 

5 

6. Peptide Therapy 

Peptides which have gene activity can be supplied to cells which carry mutant or missing 
alleles of a gene. Alternatively, peptides specific for a mutant form of the protein product of a gene 
can be supplied to cells carrying a wild type protein. The protein product of a gene can be produced by 

10 expression of the cDNA sequence in bacteria, for example, using known expression vectors (as 

described in Section H entitled "Production of a Mutant Protein"). Alternatively, the protein product of 
a gene can be extracted from mammalian cells engineered to produce the protein product of a gene of 
interest. In addition, the techniques of synthetic chemistry can be employed to synthesize the protein 
product of a gene. Any of the above techniques can provide a preparation of protein product of a gene 

15 that is substantially free of other human proteins. This is most readily accomplished by carrying out 
protein synthesis in a microorganism or in vitro. 

Active gene molecules can be introduced into cells by microinjection or by the use of 
liposomes, for example. Alternatively, some active molecules may be taken up by cells, actively or by 
diffusion. Extracellular application of the protein product of a gene may be sufficient to decrease or 

20 reverse the physiological effects of osteoarthritis. Other molecules with the activity of a protein 
product of a gene (for example, peptides, drugs or organic compounds) may also be used to effect 
such a reversal. Modified polypeptides having substantially similar function may also be useful for 
peptide therapy. 

25 7. Transformed Hosts 

Cells and animals which carry a mutant allele of a gene can be used as model systems to 
study and test for substances which have potential as therapeutic agents. Following application of a 
test substance to the cells, the phenotype of the cell will be determined. Any variety of phenotypic 
changes associated with osteoarthritis can be assessed, including insulin resistance and combined 
30 insulin resistance/insulin secretion detect. Assays for each of these traits are known in the art. 

Animals useful for testing therapeutic agents can be selected after mutagenesis of whole 
animals or after treatment of germline cells or zygotes. Such treatments include insertion of mutant 
alleles of a gene, usually from a second animal species, as well as insertion of disrupted homologous 
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genes. Alternatively, the endogenous gene of the animals maybe disrupted by insertion or deletion 
mutation or other genetic alterations using conventional techniques (Capecchi, 1989, Science, 
244:1288; Valancius and Smithies, 1991, Mol. Cell. BioL, 11:1402; Hasty et aL, 1991, Nature, 350:243; 
Shinkai et aL, 1992, Cell, 68:855; Mombaerts et aL, 1992, Cell, 68:869; Philpott et aL, 1992, Science, 
5 256:1448; Snouwaert et aL, 1992, Science, 257:1083; Donehower et aL, 1992, Nature, 356;215). 

Following the administration of test substances, the physiological changes associated with osteoarthritis 
will be assessed. If the test substance prevents or suppresses any of these physiological changes, then 
the test substance will be considered a candidate therapeutic agent for the treatment of osteoarthritis. 
These animal models provide an extremely important testing vehicle for potential therapeutic products. 

10 

8. Use of a Polynucleotide as a Unique Sequence Marker: 

Polynucleotides can be used to mark objects or substances for the purposes of later 
identification. Thus, polynucleotides of the invention are useful for tracking the manufacture and 
distribution of a large number of diverse substances, including but not limited to: (1) natural resources 

15 such as animals, plants, oil, minerals, and water; (2) chemicals such as drugs, solvents, petroleum 

products, and explosives; (3) commercial by-products including pollutants sucli as radioactive or other 
hazardous waste; and (4) articles of manufacture such as guns, typewriters, automobiles and 
automobile parts. A nucleic acid according to the invention, when used as a marker, thus aids in the 
determination of product identity and so provides information useful to manufacturers and consumers. 

20 Polynucleotides have the advantage over other marking materials of being readily amplifiable 

through the use of polymerase chain reaction (PGR) technology. The method of PGR is well known in 
the art. PGR is performed as described by Mullis & Faloona, 1987, Methods EnzymoL, 155:335, herein 
incorporated by reference. It is the unique sequence of a polynucleotide which renders it useful as a 
marker, since thesequence, or a characteristic pattern derived from its sequence, confers a property 

25 on the polynucleotide which permits it to be tracked. 

It is contemplated that a novel polynucleotide sequence of the invention, or fragments or 
derivatives of it may be used as markers by their attachment to or mixture in objects or substances to 
be marked. Methods for marking various classes of substances and later detection of the tags in those 
substances are disclosed in U.S. Patent Nos. 5,451,505, and 5,643,728. 

30 Briefly, the use of a polynucleotide of the invention as a marker may entail combining a 

polynucleotide with the substance or object to be marked, using methods appropriate to that substance 
or object; and detecting the marker through amplification of the polynucleotide sequence using PGR 
technology, followed by either sequence analysis or identification by other means known in the art 
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(e.g., hybridization assays). 

The methods of applying a marker nucleic acid to a substance or object and subsequent 
detection of that nucleic acid will vary depending upon the nature of the substance or object and the 
environment to which it will be exposed. For example, inert solids such as paper, many pharmaceutical 

5 products, wood, some foodstuffs, etc., can be either processed with the marker nucleic acid, or the 
nucleic acid maybe sprayed onto their surfaces. Chemically active substances, such as foodstuffs 
with enzymatic activity, polymers with charged groups, or acidic pharmaceuticals may require that a 
protective composition (e.g., liposomes) be added to the nucleic acid being used as a marker. 

In order to mark liquids, the nucleic acid may be mixed directly with the liquid, or, if the 

10 chemical nature of tire liquid is not compatible with this approach (i.e., nucleic acids are not soluble in 
the liquid), the nucleic acid maybe mixed with a detergent to enhance its solubility. Containerized 
gases may be marked simply by adding a nucleic acid to the container in dry form, as it will be 
dispersed throughout the gas as the gas is released. 

Hie amount of nucleic acid to add to a substance as a marker will also vary with the given 

15 situation, as will the detection strategy. PCR technology, however, allows the amplification and 

detection of as little as one molecule from a sample. Other means of detection, such as hybridization 
assays require that more nucleic acid be recovered from a sample to efficiently detect it. PCR can be 
combined with a hybridization assay, however, to enhance the sensitivity of the method. 

A nucleic acid sequence used as a marker will generally be from 20 to 1,000 bases long, and 

20 preferably will be 60 to 1,000 bases long when PCR is to be used to detect the marker. 

One example of a substance for which nucleic acid marking is suited is gunpowder. Marked 
gunpowder may be prepared as follows: 1) add 16 ng of nucleic acid bearing the chosen marker 
sequence (derived from a polynucleotide of the invention) to 1 ml of distilled water; 2) mix the solution 
of nucleic acid with 1 g of nitrocellulose-based gunpowder; and 3) dry in air or under vacuum at 85°C. 

25 To recover the marker from gunpowder: 1) wash the gunpowder sample with 1 ml of distilled water; 
2) add 50 ml of the wash solution to a standard PCR mix, or, alternatively, place gunpowder flakes 
directly into a 100 ml PCR mix; and 3) amplify according to standard PCR methods using primers 
which anneal at opposite ends and on opposite strands of the sequence used as a marker (annealing 
and extension conditions will depend upon the exact sequences chosen for oligonucletide primers, and 

30 may be adjusted according to methods known in the art). 

Another example of a substance which may be marked with a nucleic acid according to the 
invention is ink. To prepare marked ink sample: 1) if the ink is water insoluble, mix the nucleic acid 
with detergents as for oil. If the ink is water soluble, add nucleic acid directly to the ink to a 
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concentration of about 1 to 20 ng per ml. To recover the marker from ink, proceed as for oils and 
medicines. 

In the above examples, the presence of an amplification product of the proper size (visualized, 
for example by gel electrophoresis alongside nucleic acid size markers followed by ethidium bromide 
5 staining of the gel, according to standard methods) will indicate the presence of the marker in the 
sample. In some instances, the PCR product may be further subjected to hybridization analysis or to 
sequencing to enhance the accuracy of the method. A method of hybridization analysis which can be 
used is described herein. 

10 9. Use of a Polynucleotide of the Invention as a Marker for Chromosome Mapping: 

Because a polynucleotide of the invention is novel, (that is, its sequence is unique),it is useful 
as a marker for chromosomal mapping. There are a number of methods of chromosomal mapping 
known in the art. Prominent among them is the variant of the in situ hybridization technique known as 
"Fluorescence In Situ Hybridization", or FISH. Details of methods and solutions used for in situ 

15 hybridization are well-known in the art. There are many variations of the FISH technique itself, 

however the basic approach is similar in each case. Essentially, in situ hybridization of cells, nuclei, or 
metaphase chromosome spreads is performed with a polynucleotide probe either directly labeled with 
a fluorochrome, or labeled with a moiety which will be bound by a fluorochrome tagged entity. The 
hybridized probe is visualized by irradiation of the sample with light in the wavelength which excites 

20 fluorescence from the fluorochrome. When combined with standard methods of karyotyping known in 
the art, this method allows the polynucleotide sequence to be localized to a particular arm of a 
particular chromosome. Once mapped to a specific chromosome, the location of the novel 
polynucleotide sequence on that chromosome may be further localized by in situ hybridization along 
with probes specific for known genes or sequences, labeled with other fluorescent tags which allow 

25 the differentiation of the signals from the different probes. Such an approach and various adaptations 
of it allows the localization of the novel gene relative^ to a known gene. Methods of generating and 
using fluorescence-labeled polynucleotide probes for FISH and chromosome mapping are known in 
the art (for example, see Malcolm et aL, 1981, Ann. Hum. Genet., 45:134; Bar- Am et al., 1992, 
Genes. Chromosomes & Cancer, 4:314; Pinkel et al., 1988, Proc. Natl. Acad. Sci. USA, 85:9138; 

30 U.S. Patent No. 5,728,527). Additional variations of the chromosome mapping method utilize a PCR 
approach (Dionne et aL, 1990, BioTechniques, 8(2):190 and Iggo et al, 1989, Proc. Natl. Acad. Sci. 
USA, 86:6211). 

In addition to being able to determine the chromosomal location of the novel polynucleotide, 
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similar technology, in which FISH is combined with flow cytometry, will allow the polynucleotide of 
the invention to be used to sort chromosomes, nuclei, or whole cells containing various dosages (i.e., 
gene copy numbers) of the gene encoding that polynucleotide (Hulfdin et al., 1998, Nuc. Acids Res., 
26:3651). The novel polypeptide may also be useful as a diagnostic indicator of a disease, including but 
5 not limited to those listed in Table I (Kuo et al, 1990, Am. J. Hum. Genet, 47:A119). 

10. Use of a Polynucleotide of the Invention as a Marker for Analysis of Forensic 
Materials 

Forensic science depends heavily on methods for determining the source of various 
10 compounds associated with criminal activity. In particular, the identification of individuals involved in 
criminal activity through analysis of substances found at the crime scenes is critical. Such identification 
is possible with genetic typing, which involves the determination of the genotype of an individual with 
regard to loci which are polymorphic within the population. As used herein, "polymorphic" refers to a 
gene or other segment of DNA which shows nucleotide sequence variability from individual to 
15 individual. The use of PGR techniques and nucleotide probes to detect even single nucleotide changes 
in a polynucleotide sequence has revolutionized the field of forensic serology (see Reynolds and 
Sensabaugh, 1991, Anal. Chem., 63:2). For an example of polymorphisms useful for forensic 
identification and methods of typing samples with regard to those polymorphisms, see U.S. Patent # 
5,273,883. 

20 If a polynucleotide of the invention is found to have nucleotide sequence variation among 

individuals within a population, it may be useful in the analysis of forensic samples. There are a 
number of methods known to those skilled in the art for typing nucleic acids with regard to 
polymorphisms. It should be understood that any such method is acceptable according to the invention. 
One particular method is termed the "reverse dot blot" method. The basic steps involved are: 1) 

25 oligonucleotides bearing the sequences of various polymorphic forms of the polynucleotide region to be 
analyzed are bound to membranes; 2) labeled, PGR- amplified fragments, derived from the sample to 
be genotyped, and corresponding to the polymorphic region ("target DNA") are allowed to hybridize to 
the bound oligonucleotides under conditions which only allow the hybridization of molecules with 100% 
complementary sequences; 3) unbound target DNA is removed; and 4) hybridized molecules are 

30 detected. 

The specific genotype of the individual from whom the target sample was obtained 
(amplified), with regard to the polymorphic region of a polynucleotide of the invention, may thus be 
determined by screening a panel of probes containing the known polymorphic sequence variations of 
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that region. It should be understood that the hybridization conditions may be adjusted by one of skill in 
the art so that limited amounts of non-complementarity, including single base mismatches, may be 
detected with this method. 

5 Q. Pharmaceutical Compositions-Prevention and Treatment 

1 . Administration of Pharmaceutical Compositions 

Administration of pharmaceutical compositions is accomplished orally or parenterally. 
Methods of parenteral delivery include topical, intra-arterial (directly to the tumor), intramuscular, 

10 subcutaneous, intramedullary, intrathecal, intraventricular, intravenous, intraperitoneal, or intranasal 
administration. Li addition to the active ingredients, these pharmaceutical compositions may contain 
suitable pharmaceutically acceptable carrier preparations which can be used pharmaceutically. 

Pharmaceutical compositions for oral administration can be formulated using pharmaceutically 
acceptable carriers well known in the art in dosages suitable for oral administration. Such carriers 

15 enable the pharmaceutical compositions to be formulated as tablets, pills, dragees, capsules, liquids, 
gels, syrups, slurries, suspensions and the like, for ingestion by the patient. 

Pharmaceutical preparations for oral use can be obtained through combination of active 
compounds with solid excipient, optionally grinding a resulting mixture, and processing the mixture of 
granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores. Suitable 

20 excipients are carbohydrate or protein fillers such as sugars, including lactose, sucrose, mannitol, or 
sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, 
hydroxypropylmethyl-cellulose, or sodium carb oxymethyl cellulose; and gums including arabic and 
tragacanth; and proteins such as gelatin and collagen. If desired, disintegrating or solubilizing agents 
may be added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such 

25 as sodium alginate. 

Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which 
may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or 
titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures. Dyestuffs or 
pigments may be added to the tablets or dragee coatings for product identification or to characterize 

30 the quantity of active compound, ie, dosage. 

Pharmaceutical preparations which can be used orally include push-fit capsules made of 
gelatin, as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol. 
Push-fit capsules can contain active ingredients mixed with a filler or binders such as lactose or 
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starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, 
the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid 
paraffin, or liquid polyethylene glycol with or without stabilizers. 

Pharmaceutical formulations for parenteral administration include aqueous solutions of active 
5 compounds. For injection, the pharmaceutical compositions of the invention maybe formulated in 
aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer 7 
solution, or physiologically buffered saline. Aqueous injection suspensions may contain substances 
which increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol, or 
dextran. Additionally, suspensions of the active solvents or vehicles include fatty oils such as sesame 
10 oil, or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. Optionally, the 
suspension may also contain suitable stabilizers or agents which increase the solubility of the 
compounds to allow for the preparation of highly concentrated solutions. 

For topical or nasal administration, penetrants appropriate to the particular barrier to be 
permeated or used in the formulation. Such penetrants are generally known in the art. 

15 

2. Manufacture and Storage 

The pharmaceutical compositions of the present invention may be manufactured in a manner 
. that known in the art, e.g. by means of conventional mixing, dissolving, granulating, dragee-making, 
levitating, emulsifying, encapsulating, entrapping or lyophilizing processes. 
20 The pharmaceutical composition may be provided as a salt and can be formed with many 

acids, including but not limited to hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc... 
Salts tend to be more soluble in aqueous or other protonic solvents that are the corresponding free 
base forms. In other cases, the preferred preparation maybe a lyophilized powder in lmM-50 mM 
histidine, 0.1%-2% sucrose, 2%-7% mannitol at a PhRange of 4.5 to 5.5 that is combined with buffer 
25 prior to use. 

After pharmaceutical compositions comprising a compound of the invention formulated in a 
acceptable carrier have been prepared, they can be placed in an appropriate container and labeled for 
treatment of an indicated condition with information including amount, frequency and method of 
administration. 

30 

3 . Therapeutically Effective Dose 

Pharmaceutical compositions suitable for use in the present invention include compositions 
wherein the active ingredients are contained in an effective amount to achieve the intended purpose. 
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The determination of an effective dose is well within the capability of those skilled in the art. 

For any compound, the therapeutically effective dose can be estimated initially either in cell 
culture assays, or in animal models, usually mice, rabbits, dogs, or pigs. The animal model is also used 
to achieve a desirable concentration range and route of administration. Such information can then be 

5 use to determine useful doses and routes for administration in humans. 

A therapeutically effective dose refers to that amount of protein or its antibodies, antagonists, 
or inhibitors which ameliorate the symptoms or conditions. Therapeutic efficacy and toxicity of such 
compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental 
animals, eg, ED50 (the dose therapeutically effective in 50% of the population) and LD50 (the dose 

10 lethal to 50% of the population). The dose ratio between therapeutic and toxic effects is the 

therapeutic index, and it can be expressed as the ratio, LD50/ED50. Pharmaceutical compositions 
which exhibit large therapeutic indices are preferred. The data obtained from cell culture assays and 
animals studies is used in formulating a range of dosage for human use. The dosage of such 
compounds lies preferably within a range of circulating concentrations that include the ED50 with 

15 little or no toxicity. The dosage varies within this range depending upon the dosage from employed, 
sensitivity of the patient, and the route of administration. 

The exact dosage is chosen by the. individual physician in view of the patient to be treated. 
Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain 
the desired effect. Additional factors which may be taken into account include the severity of the 

20 disease state; age, weight and gender of the patient; diet, time and frequency of administration, drug 
combination^), reaction sensitivities, and tolerance/response to therapy. Long acting pharmaceutical 
compositions might be administered every 3 to 4 days, every week, or once every two weeks 
depending on a half-life and clearance rate of the particular formulation. 

Dosage amounts may vary from 0.1 to 100,000 micrograms per person per day, for example, 

25 lug, lOug, lOOug, 500 ug, lmg, lOmg, and even up to a total dose of about lg per person per day, 

depending upon the route of administration. Guidance as to particular dosages and methods of delivery 
is provided in the literature. See U.S. Patent Nos. 4,657,760; 5,206,344; or 5,225,212, hereby 
incorporated by reference. Those skilled in the art will employ different formulations for nucleotides 
than for proteins or their inhibitors. Similarly, delivery of polynucleotide or polypeptides will be specific 

30 to particular cells, conditions, locations, etc... 

Without further elaboration, it is believed that one skilled in the art can, using the preceding 
description, utilize the present invention to its fullest extent. The following embodiments are, therefore, 
to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way 
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whatsoever. 

The disclosures of all patents, applications, and publications mentioned above and below, 
including U.S. Ser. No. 60/342,603, are hereby expressly incorporated by reference. 

5 EXAMPLES 

1. Establishment of an Association Between a Given Polynucleotide Sequence and 
Diabetes 

A polynucleotide sequence according to the invention containing a mutation which is believed 

10 to be associated with a disease, can be statistically linked to that disease by linkage analysis. An 

animal model system exhibiting a particular phenotypic defect that is characteristic of the disease of 
interest is selected. A series of genetic crosses is performed in this animal model system between 
individuals having an observable mutant phenotype and normal individuals of a control strain. At least 
one disease-related locus or a chromosomal marker that does not comprise a disease related locus is 

15 used as a marker in these crosses. If a statistically significant pattern of non-random assortment of the 
mutant trait with a marker locus is observed, the trait is linked to the marker locus. 

Similarly, linkage analysis can be performed on an existing human or other mammalian 
pedigree. According to this method, numerous genetic loci from affected and unaffected family 
members are compared. Non-random assortment of a given genetic marker between affected and 

20 unaffected family members relative to the distributions observed for other genetic loci indicates that 
the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical 
proximity to another that does so. 

If either approach demonstrates a non-random assortment of the disease-related phenotype 
with a marker locus, this is indicative of an association between the gene underlying the defect and 

25 that locus. Because the strength of any conclusion drawn from linkage analysis is statistically-based, 
the accuracy of the results is thought to be proportional to the number of crosses or family members 
and genetic loci analyzed. 

2. Screening Assay For a Disease 

30 A polynucleotide sequence according to the invention can be used as a marker for a normal 

phenotype or for a phenotype associated with a disease of interest. 

If it can be demonstrated by the methods of phenotyping, described above, that a particular 
sequence is associated with a disease phenotype, this sequence can be used as a marker for a 
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particular disease. A sequence of interest can be used as a probe to screen genomic DNA from 
individuals by Southern blot analysis according to the method described above. If the sequence of 
interest is detected by Southern blot analysis, and the presence of this sequence is confirmed by direct 
sequencing, it can be concluded that the individual from winch the genomic DNA has been isolated 

5 has an increased frequency for the development of the disease for which the sequence is a marker. 

The marker can also be used as a disease indicator according to the method of PCR. A 
genomic DNA sample of interest can be analyzed in a PCR reaction wherein one of the primers 
contains the marker sequence. If the marker sequence is present in the sample DNA, a PCR product 
will be produced. Alternatively, the PCR primers can be designed such that they amplify a region 

10 containing the marker sequence. The amplified product can be analyzed by hybridization methods, 
described above, to determine the presence of the sequence of interest. 

3. Use of a Given Polynucleotide as a Target for Drug Screening 

A polynucleotide according to the invention, containing a mutation which is believed to be 

15 associated with a disease can be used a target for drug screening. 

One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably 
transformed with a polynucleotide according to the invention and either exhibit a particular phenotype 
characteristic of the presence of the polynucleotide or express a polypeptide or fragment encoded by 
the polynucleotide. Such cells, either in viable or fixed form, can be used for standard competitive 

20 binding assays. In particular, these cells can be used to measure formation of a complex comprising 
the protein product or fragment of a polynucleotide according to the invention and the agent being 
tested. Alternatively, these cells can be used to determine if the formation of a complex between the 
protein product or fragment of a polynucleotide according to the invention and a known ligand is 
interfered with by an agent being tested. 

25 An alternative method for drug screening involves using of eukaryotic cell lines or cells (such 

as described above) which contain a polynucleotide according to the invention that produces a 
defective protein. According to this method, the host cell lines or cells are grown in the presence of a 
test drug. The rate of growth of the host cells is measured to determine if the compound is capable of 
regulating the growth of cells expressing a nonfunctional protein product of the polynucleotide 

30 according to the invention. Preferably, a drug that is useful according to the invention will increase or 
decrease the growth rate of a cell by at least 10%. Alternatively, the ability of the test compound to 
restore the function of the mutant gene protein by at least 10% can be measured by using an 
appropriate in vitro assay for function of the protein product of a gene (as described in Section F 
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entitled "Identification and Characterization of Polymorphisms"). If the host cell lines or cells express 
a protein product of a gene that exhibits an aberrant pattern of cellular localization, the ability of the 
test compound to alter the cellular localization of the protein by at least 10% will be determined. 
Changes in the cellular localization of a protein of interest will be detected by performing cellular 

5 fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a 
protein of interest can be determined by immunocytochemical methods well known in the art. 

A method of drug screening may also involve the use of host eukaryotic cell lines or cells 
(described above) which have an altered gene that demonstrates an aberrant pattern of expression. 
By aberreant pattern of expression is meant the level of expression is either abnormally high or low, or 

10 the temporal pattern of expression is different from that of the wild type gene. The ability of a test 
drug to alter the expression of a mutant form of a gene by at least 10% can be measured by Northern 
blot analysis, SI nuclease analysis, primer extension or Rnase protection assays, as described above. 
Alternatively, if a mutant form of a gene contains a polymorphism in the promoter region of a gene, 
cells can be engineered to express a reporter construct comprising a mutant gene promoter driving 

15 expression of a reporter gene (e.g. CAT, luciferase, green fluorescent protein). These cells can be 
grown in the presence of a test compound and the ability of a test compound to alter the level of 
activity of the mutant gene promoter can be determined by standard assays for each reporter gene 
which, are well known in the art. 

A transgenic animal whose genomic DNA contains a polynucleotide associated with a 

20 particular phenotypic defect that is characteristic of the disease of interest, and a normal, control 
anomal (not cont aining the polynucleotide) can be treated with a candidate drug according to the 
invention. The ability of a candidate drug to ameliorate symptoms of the disease, by at least 10%, will 
be analyzed by assessing the disease syptoms and their amelioration. 

25 4. Selection of Osteoarthritis Candidate Gene Set 

Genes involved in osteoarthritis 

Key pathogenic processes involved in osteoarthritis are: 

30 

1. chondrocyte differentiation, development, apoptosis and signalling 

2. cartilage components and synthesis : proteoglycans, hyaluronan synthases, extracellular 
matrix molecules 
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3. cartilage degradation: cathepsin proteases and matrix metalloproteinases, their inhibitors 

4. bone remodelling signals (e.g. RANK/RANKL) : BMPs, TGFbeta, interleukins, their 
receptors and antagonists, downstream signaling. 

5. synovial fluid components 

6. systemic factors influencing bone and cartilage remodelling: leptin, estrogen, progesterone, 
inflammatory cytokines, retinoic acid 

Polymorphisms at the following genes have been reported in the literature to be involved with 
increased risk of osteoarthritis. They include components of the extracellular matrix, and bone- 
remodelling signalling components (Table 2) 

With the aim of expanding and improving on the current limited knowledge of osteoarthritis 
genetic predisposition, we have collected over 500 candidate bone and cartilage remodelling genes 
using the following methods: 

1. extensive literature search for genes involved in relevant biochemical pathways and 
physiological processes 

2. analysis and comparisons of cDNA libraries within the Incyte Lifeseq® database from 
relevant normal and diseased tissues and in vitro modelling systems 

3. co-expression analysis using Incyte's "Guilt by Association" algorithm which identifies novel 
genes in key biochemical pathways by comparing the expression patterns of genes within the 
Lifeseq® database 

5. Polymorphisms in Genes Associated with Osteoarthritis 

The osteoarthritis candidate gene list was compiled using gene or gene sequences selected 
from literature sources, using sequence homology, library subtraction and expression analysis. 

Expression analysis was performed using "quilt-by-association" queries to identify Incyte- 
novel and known genes not previously associated with diabetes which have similiar expression 
patterns to genes known to be involved in diabetes or related conditions. Guilt-by- association analysis 
was performed as described in Walker et al. 1999 Genome Res 9:1198; Walker et al. 1999 Ismb :282; 
and US Patent Application 09/226,994 entitled "Insulin-Synthesis Genes" (Atty Docket No: PB-0008 
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US) filed January 7, 1999, all of which are incorporated by reference. 

Polymorphism discovery was by fSSCP as decribed in section F "Identication and 
Characterization of Polymorphisms", subsection b5 for polymorphisms referred to in Table 3 for 
source wetSNPs. Polymorphisms referred to as source isSNPs were discovered as described in 
5 section F "Identification and Characterization of Polymorphisms", subsection a. Polymorphisms 

referred to as source dbSNPs are polymorphisms in public genomic sequence where gene structure is 
unknown. The polymorphisms were mapped to cDNA sequences in the LifeSeqGold database 
(Incyte) to identify gene identity. 

10 6. Frequency of Polymorphisms in Diabetes Associated Genes and Polynucleotides in 
Various Populations 

Polymorphisms identified in EXAMPLES 4 and 5 were genotyped against populations 
described below by fSSCP or FP-TDI as described above. The results of the population frquency 
studies are given in Table 2. 

15 Two panels of human DNA have been developed to support the identification of frequent 

SNPs within an ethnically diverse population. The genomic Human Diversity Panel will be used 
where full genomic structure is available, and allows screening of the open reading frame of the gene, 
including splice junctions. In instances where genomic structure for selected candidate genes may not 
be available, a cDNA version of the HDP Screening Panel permits screening of the open reading 

20 frame of the gene. 

This DNA panel is derived from 47 consented individuals from four ethnic groups (Caucasian, 
African-American, Asian and Hispanic). The panel is sufficiently sized to enable identification of 95 % 
of SNPs with allele population frequencies >= 5%. Comparable utility of our panel with the NTH 
Diversity panel was demonstrated by parallel screening of 90 kilobases of coding sequence from each 
25 panel. 

A cDNA counterpart to our Human Diversity Panel has been generated from lymphoblastoid 
cell lines to obviate the need for intron/exon structure in 50% of human genes. In the absence of 
genomic structure, this methodology will be employed to screen the entire open reading frame of the 
gene. 

30 Various modifications and variations of the described compositions, methods, and systems of 

the invention will be apparent to those skilled in the art without departing from the scope and spirit of 
the invention. Although the invention has been described in connection with certain embodiments, it 
should be understood that the invention as claimed should not be unduly limited to such specific 
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embodiments. Nor should the description of such embodiments be considered exhaustive or limit the 
invention to the precise forms disclosed. Furthermore, elements from one embodiment can be readily 
recombined with elements from one or more other embodiments. Such combinations can form a 
number of embodiments within the scope of the invention. It is intended that the scope of the 
5 invention be defined by the following claims and their equivalents. 



141 



WO 03/054166 



PCT/US02/41225 



TABLE 1 



AACT 

Full name : alpha-l-antichymotrypsin 
Link : AACT_link_cdna 

Subsequence GB : AACT 1 

CDS GB : AACT . 1 1302 bp 
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GB:AL049839_2 3 59799 59799 A>G 

source isSNP SNP00073834 

consequence AACT_cds . 2 5 Silent 86-86 F 

GB:AL049839_2 3 59844 59844 A>G 
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TABLE 1 (Cont.) 
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64470 A>G 

Intron 

64476 A>G 

Intron 

64477 A>G 

Intron 
64488 A>G 

Intron 
64494 A>G 

Intron 
65434 A>G 

Intron 
65440 A>G 

Intron 

65451 A>G 

Intron 

65452 A>G 

Intron 
65458 A>G 

Intron 
68858 A>G 



5 3' 
68882 68882 A>G 
GB:AL049839_2 . v68 882 . A>G 
5 3' 



K 



ABL1 

Full name : v-abl Abelson murine leukemia viral oncogene homolog 1 
Link : ABLl_link_cdna 

Subsequence GB:NM_005157 1 5744 #6 

CDS GB:NM_005157 .1 3393 bp #7 

ORF 365 3757 
Allele GB:NM_005157 6 1916 1916 OG 

source isSNP SNP00046020 

consequence GB :NM_005157 . 1 7 Missense 518-518 A>P 

Allele GB:NM_005157 6 2^6 2716 OG 



WO 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



source isSNP SNP00068702 

consequence GB :NM_0 05157 . 1 7 Silent 

Allele GB:NML.005157 6 3625 3625 A>G 

source isSNP SNP00098956 

consequence GB : NM__0 05157 . 1 7 Silent 

Allele GB:NM_005157 6 3688 3688 A>G 

source isSNP SNP00012765 

consequence GB : TSJM__0 05157.1 7 Silent 

Allele GB:NM_005157 6 3894 3894 OG 

source isSNP SNP00046021 

consequence GB :NM_005157 . 1 7 3' 

Allele GB:NM_005157 6 4612 4612 A>G 

source isSNP SNP00051628 

consequence GB : NM_0 05157.1 7 3' 

Allele GB:NM_005157 6 5512 5512 A>G 

source isSNP SNP00012768 

consequence GB :NM_005157 . 1 7 3' 

GIF ABL1 - cdna-fwd . gi f 
Link : ABLl_link_genomic 



784-784 



1087-1087 



1108-1108 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 



ABLl_cds.l 73887 
ABLl„cds . 2 29132 
GB:U07561_1 1 
GB:U07563_1 36063 
ABLl_mrna__bui Id . 1 
ABL l_mrna_bu i 1 d . 2 
ABL l_mrna__bu i 1 d . 3 



CDS 


ABLl_cds.l 3393 bp 11 


exons 




exon 


73887 73965 








exon 


85951 86124 








exon 


86688 86983 








exon 


94650 94922 








exon 


104016 


104100 






exon 


104747 


104924 






exon 


106755 


106939 






exon 


109237 


109389 






exon 


110890 


110979 






exon 


111322 


111486 






exon 


114793 


116507 




CDS 


ABLl_ 


cds.2 3450 bp 11 


exons 




exon 


29132 29267 








exon 


85951 86124 








exon 


86688 86983 








exon 


94650 94922 








exon 


104016 


104100 






exon 


104747 


104924 






exon 


106755 


106939 






exon 


109237 


109389 






exon 


110890 


110979 






exon 


111322 


111486 






exon 


114793 


116507 




iriRNA 


ABLl_mrna__build . 1 5762 bp 




exon 


73506 73965 








exon 


85951 86124 








exon 


86688 86983 








exon 


94650 94922 




144 



116507 
116507 
35962 #10 
120601 

73506 118495 
28792 116507 
73724 116507 
#8 



#8 
#9 

#11 



#12 
#13 
#14 



#9 



11 exons 



#12 
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exon 
exon 
exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



104016 
104747 
106755 
109237 
110890 
111322 
114793 



104100 
104924 
106939 
109389 
110979 
111486 
118495 



ABLl__mrna__build . 2 
28792 29267 
85954 86124 
86688 86983 
94650 94922 
104016 



3787 bp 



11 exons 



#13 



104747 
106755 
109237 
110890 
111322 
114793 



104100 
104924 
106939 
109389 
110979 
111486 
116507 



ABL l_mrna_bu i 1 d . 3 
73724 73965 
85951 86124 
86688 86983 
94650 94922 
104016 



3556 bp 



11 exons 



#14 



104747 

106755 

109237 

110890 

111322 

114793 

GB:U07561_1 

source 

consequence 

consequence 

GB:U07561_1 

source 

source 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB-.U07563 J. 

source 



104100 
104924 
106939 
109389 
110979 
111486 
116507 

10 29061 29061 A>G 

isSNP SNP00120072 
ABLl_cds .18 5 ' 

ABLl_cds .2 9 5 ' 

10 30837 30837 A>G 

dbSNP gnl|dbSNP|ss642659_allele 
dbSNP gnl|dbSNP| ssl045108_allele 
dbSNP gnl|dbSNP|ssl044696_allele 
ABLl_cds .18 5 ' 

ABLl_cds . 2 9 Intron 

11 35864 35864 A>G 
isSNP SNP00048470 
ABLl_cds . 1 8 

9 

58876 



ABLl_cds . 2 
11 58876 
wetSNP 
ABLl_cds . 1 
ABLl_cds . 2 
11 68640 
wetSNP 
ABLl_cds . 1 
ABLl_cds.2 
11 74901 
wetSNP 



5' 

Intron 
OG 

GB:U07563_1 .v58876.C>G 

8 Intron 

9 Intron 
68640 A>G 

GB :U07563_1 . v68640 .T>C 

8 • Intron 

9 Intron 
74901 A>G 

GB:U07563 l.v74901.A>G 
145 
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Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563„1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563__1 

source 

source 

consequence 

consequence 

GB:U07563__1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563__1 

source 

consequence 

consequence 

GB:"U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 



ABLl_cds . 1 8 
ABLl_cds . 2 9 
11 75298 75298 

isSNP SNP00046020 
ABLl_cds . 1 8 
ABLl__cds . 2 
11 78921 
wetSNP 
ABLl_cds . 1 
ABLl_cds . 2 
11 79239 
wetSNP 
ABLl_cds . 1 
ABLl_cds . 2 
11 79404 



Silent 
Silent 
OG 



499-499 
518-518 



Missense 
9 Missense 
78921 A>G 
GB:U07563_1 .V78921 

8 Silent 

9 Silent 
79239 A>G 
GB:U07563_l.v79239 

8 Silent 

9 Silent 
79404 OG 

isSNP SNP00068702 

wetSNP GB:U07563„1 . V79404 

8 Silent 

9 Silent 
79657 A>G 
GB:U07563_1 .v79657 

8 Missense 

9 Missense 
79750 A>G 
GB:U07563_l.v79750 

8 Missense 

9 Missense 
80313 A>G 



ABLl_cds . 1 
ABLl_cds . 2 
11 • 79657 
wetSNP 
ABLl_cds . 1 
ABLl_cds . 2 
11 79750 
wetSNP 
ABLl_cds . 1 
ABLl_cds . 2 
11 80313 
isSNP SNP00098956 
ABLl_cds.l 8 Silent 
ABLl_cds.2 9 Silent 
11 80376 80376 A>G 

isSNP SNP00012765 
wetSNP GB : U0 7 5 6 3_1 . v8 0 3 7 6 

ABLl_cds.l 8 Silent 
ABLl_cds.2 9 Silent 
11 80582 80582 C>G 

isSNP SNP00046021 
ABLl_cds .18 3 ' 

ABLl_cds .2 9 3 ' 

11 81298 81298 A>G 

isSNP SNP00051628 
ABLl_cds .18 3 ' 

ABLl_cds .29 3 ' 

11 81806 81806 A>G 

isSNP SNP00012766 
ABLl_cds .18 3 ' 

ABLl_cds.2 9 3' 
11 82199 82199 A>G 

isSNP SNP00012768 
ABLl_cds .18 3 ' 

ABL1 cds .2 9 3 ' 



518-518 
537-537 

,G>A 
623-623 
642-642 

,G>A 
729-729 
748-748 



,C>G 

784-784 

803-803. 

.OT 
869-869 
888-888 

.C>T 
900-900 
919-919 



1087-1087 
1106-1106 



. G>A 

1108-1108 
1127-1127 



E 
E 



A>P 
A>P 



E 
E 



T 
T 



P 
P 



P>S 
P>S 



P>S 
P>S 



I 
I 



p 
p 



GIF ABLl-genomic-fwd.gif 
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ADAM 9 

Full name : a disintegrin and metalloproteinase domain 9 
Link : ADAM9_link_cdna 

Subsequence GB:HSU41766 1 3865 



CDS GB:HSU41766.1 



79 2538 
GB:HSU41766 
source 
consequence 
GB:HSU417 6 6 
source 
consequence 
GB:HSU41766 
source 
consequence 
GB:HSU41766 
source 
consequence 
GB:HSU41766 
source 
consequence 
GB:HSU4176 6 
source 
consequence 
GIF ADAM9-cdna-fwd.gif 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



#15 

2460 bp #16 

15 462 462 G>T 

isSNP SNP00060630 

GB:HSU41766 .1 16 

15 1486 1486 A>G 

isSNP SNP00122821 

GB:HSU41766.1 16 

15 1580 1580 G>T 

isSNP SNP00060631 

GB:HSU41766 .1 16 

15 2845 2845 A>G 

isSNP SNP00024957 

GB:HSU41766.1 16 

15 3112 3112 A>G 

isSNP SNP00122822 

GB:HSU41766.1 16 

15 3703 3703 A>G 

isSNP SNP00024958 

GB:HSU4176 6.1 16 



Missense 



Missense 



Missense 



128-128 



470-470 



501-501 



I>M 



G>S 



N>T 



ADAMTSl 

Full name : a disintegrin-like and metalloprotease (reprolysin 

thrombospondin type 1 motif/ 1 

Link : ADAMTSl_link_cdna 

Subsequence GB : AF060152_1 1 3430 #17 

CDS GB:AF060152_1.1 2853 bp #18 



type) with 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



238 3090 

GB:AF060152_1 17 140 140 

source isSNP SNP00109009 

consequence GB : AF060152_1 . 1 18 
GB:AF060152_1 17 282 282 

source isSNP SNP0 007 1624 

consequence GB : AF060152„1 . 1 18 
GB:AF060152_1 17 768 768 

source isSNP SNP00069180 

consequence GB : AF 06 015 2_1 . 1 18 
GB:AF060152_1 17 865 865 

source isSNP SNP00069181 

consequence GB : AF0 6 01 5 2__1 . 1 18 
GB:AF060152_1 17 1686 1686 

source isSNP SNP00033973 

consequence GB : AF0 6015 2_1 . 1 18 
GB:AF060152_1 17 2294 2294 

source isSNP SNP00109010 

consequence GB : AF060152_1 . 1 18 
GB:AF060152_1 17 2370 2370 

source isSNP SNP0 003^74 



OG 

5' 
G>T 

Silent 
G>T 

Silent 
OG 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 



15-15' P 



177-177 



210-210 



483-483 



686-686 



V 



P>A 



R>H 
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consequence GB : AF060152_1 . 1 18 Silent 711-711 

Allele GB:AF060152_1 17 2958 2958 A>G 
source isSNP SNP00033975 

consequence GB : AF060152_1 . 1 18 Silent 907-907 
GIF ADAMTSl-cdna-fwd.gif 



ADAMTS4 

Full name : a disintegrin-like and metal lopro tease (reprolysin type) with 
thrombo spondin type 1 motif, 4 
Link : ADAMTS4_link_cdna 

Subsequence GB :NM_005099_1 1 4301 #19 

CDS GB:NM_005099_1.1 2514 bp #20 

ORF 401 2914 
Allele GB:NM_005099_1 19 2970 2970 A>G 

source isSNP SNP00022951 

consequence GB :NM_005Q99_1 . 1 20 3' 
Allele GB:NM_005099_1 19 3529 3529 A>G 

source dbSNP gnl | dbSNP | ss610462_allele 

consequence GB :NM_00509 9_1 . 1 20 3' 
Allele GB:NM_005099_1 19 3533 3533 A>G 

source dbSNP gnl | dbSNP | ss722414„allele 

source dbSNP gnl | dbSNP j ss999631__allele 

consequence GB : NM„00509 9_1 . 1 20 3' 
Allele GB:NM_005099_1 19 3855 3855 A>G 

source dbSNP gnl | dbSNP | ssl2989 08__allele 

consequence GB : NM_0 0509 9_1 . 1 20 3' 
GIF ADAMTS4-cdna-fwd.gif 



AGCl 

Full name : aggrecan 1 
Link : AGCl_link__cdna 

Subsequence GB : HUMAGPRO 1 7137 #21 

CDS GB : HUMAGPRO . 1 6951 bp #22 

ORF 61 7011 
Allele GB : HUMAGPRO 21 6495 6495 G>T 

source isSNP SNP00010327 

consequence GB : HUMAGPRO . 1 22 Silent 2145-2145 

GIF AGCl-cdna-fwd.gif 



ANK 

Full name : human homo log of mouse ank gene 
Link : ANK_fl_link_cdna 

Subsequence FN: 3255641CB1 1 1481 #23 

CDS FN:3255641CB1.1 1338 bp #24 

ORF 106 1443 
Allele FN:3255641CB1 23 258 258 A>G 

source isSNP SNP00073561 

consequence FN: 3255641CB1 . 1 24 Silent 51-51 A 

Allele FN:3255641CB1 23 1048 1048 OG 

148 
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source isSNP SNP00036339 

consequence FN: 3255641CB1 . 1 

Allele FN:3255641CB1 23 1106 

source isSNP SNP00075037 

consequence FN: 3255641CB1 . 1 

Allele FN:3255641CB1 23 1373 

source isSNP SNP00045819 

consequence FN: 3255641CB1 . 1 

GIF ANK-cdna-fwd.gif 
Link : ANK_link_cdna 

Subsequence GB : AF2747 53_1 1 

CDS GB:AF274753_1.1 1479 bp 



69 1547 

GB:AF274753_1 25 362 

source isSNP SNP00073561 

consequence GB : AF274753_1 . 1 
GB:AF274753__1 25 1152 

source isSNP SNP00036339 

consequence GB : AF2747 53_1 . 1 
GB:AF274753_1 25 1210 

source isSNP SNP00075037 

consequence GB : AF274753_1 . 1 
GB:AF274753_1 25 1477 

source isSNP SNP00045819 

consequence GB : AF2 7475 3_1 . 1 
GIF ANK-cdna-fwd.gif 
Link : ANK_link_genomic 



ORF 
Allele 



Allele 



Allele 



Allele 



Subs equence 
Subs equence 
Subsequence 
Subsequence 
Subsequence 



ANK__cds.l 26332 84281 
GBI : AC016575_6_000010 
GB:AC026437_2 706 
ANK„mrna_bui 1 d . 1 308 



ANK cds.2 



CDS 


ANK_c 


ds.l 


1338 bp 




exon 


26332 


26503 




exon 


36882 


37000 




exon 


39535 


39618 




exon 


44240 


44410 




exon 


46173 


46307 




exon 


49517 


49609 




exon 


53557 


53652 




exon 


78643 


78772 




exon 


81811 


81934 




exon 


82505 


82604 




exon 


84168 


84281 


CDS 


ANK_c 


ds . 2 


1479 bp 




exon 


272 


367 




exon 


26287 


26503 




exon 


36882 


37000 




exon 


39535 


39618 




exon 


44240 


44410 




exon 


46173 


46307 




exon 


49517 


49609 




exon 


53557 


53652 




exon 


78643 


78772 




exon 


81811 


81934 



272 84281 
11 exons 



12 exons 



24 Missense 

1106 A>G 

24 Missense 

1373 A>G 



24 



Missense 



315-315 



334-334 



423-423 



A>P 



V>A 



S>F 



1568 #25 
#26 



362 



A>G 



26 Silent 

1152 OG 

26 Missense 

1210 A>G 

26 Missense 

1477 A>G 



26 



Missense 



98-98 A 



362-362 



381-381 



470-470 



A>P 



V>A 



S>F 



#27 

1 605 
92528 #29 
85658 #30 
#31 
#27 



#28 



#31 
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exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



82505 82604 
84168 84281 
ANK_mr n a_bu i 1 d . 1 



2820 bp 



12 exons 



#30 



308 

26287 

36882 

39535 

44240 

46173 

49517 

53557 

78643 

81811 

82505 

84168 



367 

26503 

37000 

39618 

44410 

46307 

49609 

53652 

78772 

81934 

82604 

85658 



GB:AC026437_ 
source 
consequence 
consequence 
GB:AC026437_ 
source 
source 
source 
source 
consequence 
consequence 
GB:AC026437. 
source 
consequence 
consequence 



2 29 8413 8413 OG 

dbSNP gnl|dbSNP|ss95678_allele 
ANK_cds.l 27 5' 
ANK_cds .2 31 Intron 

_2 29 14825 14825 A>G 

dbSNP gnl|dbSNP|ss619053_allele 
dbSNP gnl|dbSNP|ssl002004_allele 
dbSNP gnl | dbSNP j ss2279 83_allele 
dbSNP gnl|dbSNP|ss324626_allele 



ANK_cds . 1 
A]STK_cds . 2 
2 29 
wetSNP 
ANK_cds . 1 
ANK_cds . 2 
GB:AC026437_2 29 
source 
source 



27 5' 

31 Intron 

25779 25779 A>G 

GB:AC026437„2 .v25779.C>T 

27 Silent 51-51 A 

31 Silent 98-98 A 

25807 25807 A>G 



isSNP SNP00104502 
we t SNP GB : AC 0 2 6 4 3 7_ 



2.v2 5807.G>A 



consequence ANK_cds . 1 27 Intron 
consequence ANK_cds . 2 31 Intron 
GB:AC02 6437_2 29 26433 26433 A>G 



source 
consequence 
consequence 
GB:AC026437. 
source 
source 
consequence 
consequence 



isSNP SNP00018441 
ANK_cds . 1 27 Intron 
ANK_cds .2 31 Intron 
2 29 30696 30696 A>T 

dbSNP gnl | dbSNP | ssl016631_allele 
dbSNP gnl|dbSNP|ss389763_allele 



AISIK_cds.l 27 
ANK_cds . 2 31 
GB:AC026437_2 29 34277 

source isSNP SNP00101566 

consequence ATSTK_cds . 1 27 
consequence ANK_cds . 2 
GB: AC026437_2 29 
source wetSNP 
consequence AKK„cds . 1 
consequence ANK_cds . 2 
GB:AC026437_2 29 
source isSNP SNP00056800 

consequence ANK_cds . 1 2^ Q 



Intron 
Intron 
34277 A>G 

Intron 
Intron 
36172 A>G 
GB:AC026437_2 .v36172 .T>C 
27 Intron 
31 Intron 
37028 37028 G>T 

Intron 



31 

36172 
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Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



consequence 

GB:AC026437. 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC02 6437„ 

source 

consequence 

consequence 

GB:AC02 6437. 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC02 6437_ 

source 

consequence 

consequence 

GB:AC02 6437_ 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB:AC026437_ 

source 

consequence 

consequence 

GB : AC026437_ 

source 



ANK_cds . 2 31 
2 29 37186 

isSNP SNP00022144 
ANK_cds . 1 27 
ANK_cds.2 31 
2 29 37205 

isSNP SNP00022143 
ANK_cds . 1 27 
ANK_cds.2 31 
2 29 37340 



Intron 
37186 G>T 

Intron 
Intron 
37205 A>G 

Intron 
Intron 
37340 A>T 



dbSNP gnl | dbSNP | ss469809_allele 



ANK__cds . 1 

ANK_cds . 2 
2 29 

wetSNP 

ANK_cds . 1 

ANK_cds . 2 
_2 29 

wetSNP 

ANK_cds . 1 

ANK_cds . 2 
2 29 

wetSNP 

ANK_cds . 1 

ANK_cds . 2 
2 29 

isSNP SNP00093702 

ANK_cds.l . ..27 

ANK_cds.2. . : 31 . 
2 29 78010 

isSNP SNP00036339 

ANK_cds.l 27 

ANK_cds.2 31 
2 29 78875 

isSNP SNP00095793 



27 Intron 
31 Intron 
52817 52817 G>T 
GB:AC026437_2.v52817 .OA 
27 Intron 
31 Intron 
52899 52899 A>G 
GB:ACQ26437_2 . v52899 . A>G 
27 Silent 274-274 

31 Silent 321-321 

52962 52962 G>T 
GB:AC026437_2 . v52962 . T>G 
27 Intron 
31 Intron 
63950 63950 A>G 



Intron 
Intron 
78010 C>G 



A 
A 



27 
31 

82852 



ANK_cds . 1 
ANK_cds . 2 
2 29 
wetSNP 
ANK_cds . 1 
ANK_cds . 2 
2 29 
isSNP SNP00120424 
ANK_cds.l 27 
ANK_cds.2 31 
2 29 83057 

isSNP SNP00120425 
ANK_cds . 1 27 
ANK„cds.2 31 
2 29 83506 

isSNP SNP00045819 
ANK_cds.l 27 
ANK_cds.2 31 
.2 29 83587 



Missense 
Missense 
78875 A>G 

Intron 
Intron 
81235 A>G 



315-315 
362-362 



A>P 
A>P 



27 
31 

81235 

GB:AC026437_2 .v81235 .T>C 



Intron 
Intron 
82852 A>G 

Intron 
Intron 
83057 A>G 

Intron 
Intron 
83506 A>G 

Missense 
Missense 
83587 A>G 



423-423 
470-470 



S>F 
S>F 



wetSNP 



AC026437_2 . v83 587 . G>A 



WO 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



consequence 
consequence 

Allele GB : AC026437_ 

source 
source 
consequence 
consequence 

Allele GB:AC026437_ 
source 
consequence 
consequence 

Allele GB:AC026437_ 
source 
consequence 
consequence 

Allele GB:AC026437_ 
source 
consequence 
consequence 

GIF ANK-genomic-fwd. gi 



3' 
3 ' 

83607 A>G 



ANK_cds . 1 27 

ANK_cds.2 31 

2 29 83607 

isSNP SNP00008779 

wetSNP GB:AC026437_2 .v83607 . A>G 

ANK_cds.l 27 3' 

ANK_cds.2 31 3' 

2 29 84086 84086 



isSNP SNP00012596 
ANK_cds . 1 27 
ANK__cds.2 31 

2 29 84156 

isSNP SNP00045820 
ANK_cds . 1 27 
ANK_cds . 2 31 

2 29 84896 

isSNP SNP00045822 
ANK_cds.l 27 
ANK_cds.2 31 

f 



3 ' 
3 ' 

84156 

3' 
3' 

84896 

3 ' 
3 ' 



A>G 



A>G 



G>T 



BGLAP 

Full name : Bone Gla Protein 

Link : FL_104137_link_genomic 

Subs equence GB : AC 0 0 7 2 2 7_2_1 0 4 1 3 7 CD 1 

Subsequence GB : AC007227_2 1 

Subsequence BGLAP_mrna_build . 1 



300 bp 



mRNA BGLAP_mrna_build . 1 

35539 35458 
35200 35162 
34991 34922 
34720 34461 
CDS GB:AC007227_2_104137CD1 
exon 35521 35458 
35200 35162 
34991 34922 
34720 34594 
GB:AC007227_2 33 
source wetSNP 
consequence GB : AC007227_ 
GB:AC007227_2 33 
source wetSNP 
consequence GB : AC007227_ 
GB:AC007227_2 33 



451 bp 



35521 34594 #32 
167932 #33 
35539 34461 #34 
4 exons 



#34 



exon 
exon 
exon 
exon 



exon 
exon 
exon 
Allele 



Allele 



Allele 



4 exons 



#32 



34618 34618 OQ 
GB:AC007227_2 .V34618 .G>C 

2_104137CD1 32 Silent 92-92 A 

34977 34977 G>T 
GB:AC007227_2.v34977 . G>T 

2_104137CD1 32 Missense 40-40 Q>K 

35228 35228 OG 



source isSNP SNP00038471 

consequence GB : AC007227_2_1 04137CD1 
GIF BGLAP-genomic-rev.gif 



32 



Intron 



BGN 

Full name : BGN 
Link : BGN_link_cdna 

152 



WO 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Subsequence GB : HUMHPGI 1 1685 

CDS GB : HUMHPGI. 1 1107 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



121 1227 
GB: HUMHPGI 
source 
consequence 
GB: HUMHPGI 
source 
consequence 
GB : HUMHPGI 
source 
consequence 
GB: HUMHPGI 
source 



35 70 70 

isSNP SNP00011488 
GB: HUMHPGI. 1 
35 261 261 

isSNP SNP00011489 
GB: HUMHPGI. 1 
35 660 660 

isSNP SNP00011490 
GB: HUMHPGI. 1 
35 1355 1355 

isSNP SNP00092805 
GB: HUMHPGI. 1 



consequence 
GIF BGN-cdna-fwd.gif 
Link : BGN_JLink_genomic 

Subsequence GB:U82695 1 76146 

Subsequence GB :U82695_2540367CD1 

Subsequence BGN_mrna_build . 1 8415 



CDS GB:U82695_2540367CD1 
exon 18042 18279 
18648 
19272 
19938 
20239 
20456 
21657 



1107 bp 



exon 
exon 
exon 
exon 
exon 
exon 
mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



18760 
19485 
20048 
20332 
20594 
21854 
B GN_mrn a_bu i 1 d . 1 
8415 8523 
18279 
18760 
19485 
20048 
20332 
20594 
22311 



16 84 bp 



Allele 



Allele 



Allele 



18031 
18648 
19272 
19938 
20239 
20456 
21657 
GB:U82695 
source 
consequence 
GB:U82695 
source 
consequence 
GB:U82695 
source 
source 
consequence 
GB:U82695 
source 
consequence 
GB:U82695 
source 
consequence 
GB:U82695 
source 



#35 
#36 

G>T 

36 
A>G 

36 
A>G 

36 
A>G 

36 



#37 

18042 21854 
22311 #39 
7 exons 



Silent 



Silent 



47-47 S 



180-180 



#38 



#38 



8 exons 



#39 



37 8484 8484 G>T 

isSNP SNP00011488 

GB:U82695_2540367CD1 38 5' 

37 18161 18161 A>G 

wetSNP GB:U82695 . vl8161 . A>G 

GB:U82695_2540367CD1 38 Silent 

37 18182 18182 A>G 

isSNP SNP00011489 

wetSNP GB : U8 269 5. vl 8182. G>A 

GB:U826 95_2540367CD1 38 Silent 

37 18330 18330 A>G 

wetSNP GB : U8 2695. vl 8330. G>A 

GB:U82695_2540367CD1 38 Intron 

37 18354 18354 A>G 

wetSNP GB : U8 2 6 9 5 . vl 8 3 5 4 . G>A 

GB:U82695_2540367CD1 38 Intron 

37 19460 19460 A>G 

isSNP SNP00011490 
153 



40-40 E 



47-47 S 



WO 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



s 

Allele 



Allele 



Allele 



Allele 



source wetSNP GB : TJ8 269 5. vl 9460. T>C 

consequence GB :U82695 — 25403 67CD1 38 Silent 



180-180 



GB:U82695 

source 

consequence 

GB: U82695 

source 

consequence 

GB: U82695 

source 

consequence 

GB:U82695 

source 

consequence 



37 21566 21566 G>T 

wetSNP GB:U82695 . v2 156 6. G>T 

GB:U82695_2540367CD1 38 Intron 

37 21639 21639 A>G 

wetSNP GB:U82695.v21639 ,OT 



GB:U82695_25403 67CD1 38 
37 21982 21982 A>G 

isSNP SNP00092805 
GB:U82695_25403 67CD1 3 8 

37 22172 22172 G>T 

isSNP SNP00011491 
GB:U82695 2540367CD1 38 



Intron 



GIF BGN-genomic-fwd.gif 



BHLHB2 

Full name : basic helix-loop-helix domain containing, class B, 

Link : BHLHB2__link_cdna 

Subsequence GB : AB004066_1 1 2922 #40 

CDS GB:AB004066_1.1 1239 bp #41 



ORF 
Allele 



Allele 



Allele 



Allele 



197 1435 

GB:AB004066_ 

source 

consequence 

GB:AB004066_ 

source 

consequence 

GB:AB004066_ 

source 

consequence 

GB:AB004066_ 

source 

consequence 



40 



196 



.1 

isSNP SNP00062724 
GB:AB004066_1.1 

1 40 829 

isSNP SNP00046376 
GB:AB004066_1.1 

1 40 2070 

isSNP SNP00013041 
GB:AB004066_1.1 

1 40 2323 

isSNP SNP00013042 
GB:AB004066_1.1 



196 

41 

829 

41 

2070 
41 

2323 
41 



A>G 

5' 
A>G 

Silent 
A>G 

3 ' 
A>G 



211-211 



GIF BHLHB2-cdna-fwd.gif 



BMP2 

Full name : BMP2 
Link : BMP2_link_cdna 

Subsequence GB : HUMBMP2A 



1547 



CDS GB : HUMBMP2 A . 1 



ORF 
Allele 



Allele 



Allele 



324 1514 
GB : HUMBMP2 A 
source 
consequence 
GB : HXJMBMP2 A 
source 
consequence 
GB : HUMBMP2 A 
source 



1191 bp 



42 584 584 

isSNP SNP00015730 
GB : HUMBMP2 A . 1 
42 760 760 

isSNP SNP00015731 
GB : HUMBMP2 A . 1 
42 984 984 

isSNP SNP0001g£32 



#42 
#43 

A>G 

43 
A>G 

43 
G>T 



Silent 



Missense 



87-87 S 



146-146 



T>I 



WO 03/054166 
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TABLE 1 (Cont.) 



GB : HUMBMP2 A . 1 
42 1484 1484 

isSNP SNP00015733 
GB : HUMBMP2 A . 1 



43 
A>G 



consequence 
Allele GB : HUMBMP2A 

source 

consequence GB : HUMBMP2 A . 1 43 
GIF BMP2-cdna-fwd.gif 
Link : FL_3220019__link_genomic 

Subsequence GB:HS859D4 1 178870 

Subsequence GB : HS859D4_3220019CD1 176685 



Missense 



Silent 



221-221 



387-387 



H>N 



#44 



167723 



#45 



Sub s equ enc e 



BMP2 mrna_build.l 178252 



167687 



#46 



mRNA 



BMP2_mrna_build.l 1547 bp 
178252 177937 
176692 176340 
168564 167687 
CDS GB:HS859D4_3220019CD1 1188 bp 

exon 176685 176340 
168564 
GB:HS859D4 
source 
consequence 



exon 
exon 
exon 



exon 
Allele 



3 exons 



2 exons 



167723 

44 167750 167750 

isSNP SNP00015733 
GB:HS859D4_3220019CD1 45 



#46 



#45 



A>G 



D 

Allele 



H>N 
Allele 



R>S 
Allele 



T>I 
Allele 



GB:HS859D4 

source 

consequence 

GB:HS859D4 

source 

consequence 

GB:HS859D4 

source 

consequence 

GB:HS859D4 
source 
source 
consequence 



44 168250 168250 

isSNP SNP00015732 
GB:HS859D4_3220019CD1 45 



Silent 



G>T 



Missense 



44 168341 168341 A>T 

wet SNP GB : HS 8 5 9D4 . vl 6 8 3 4 1 . T>A 



GB:HS859D4_3220019CD1 



45 



44 168474 168474 

isSNP SNP00015731 
GB:HS859D4_3220019CD1 45 



176425 



44 176425 
isSNP SNP00015730 

we t SNP GB:HS859D4.vl 76425. T>C 



Missense 



A>G 



Missense 



A>G 



GB:HS859D4_3220019GD1 



45 



Silent 



387-387 



221-221 



190-190 



146-146 



87-87 S 



GIF BMP2-genomic-rev.gif 



BMP4 

Full name : BMP4 
Link : BMP4„link_cdna 
Subsequence GB : 

CDS GB : HUMBMP2B . 1 



HUMBMP2B 1 
1227 bp 



ORF 
Allele 



Allele 



395 1621 
GB : HUMBMP2B 
source 
consequence 
GB : HUMBMP2B 
source 
consequence 
GIF BMP4-cdna-fwd.gif 
Link : BMP4_link__genomic 

Subsequence GB : HSU43 842 



308 



1751 



308 



47 

isSNP SNP00074676 
GB : HUMBMP2B . 1 
47 849 849 

isSNP SNP00000573 
GB : HUMBMP2B . 1 



#47 
#48 

A>G 

48 
A>G 

48 



33 #49 



Missense 



152-152 



V>A 



WO 03/054166 
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Subsequence 
Subsequence 
mRNA 



GB:HSU43 842„1613615CDl 
BMP 4 mrna build. 1 3207 



BMP4_mrna_build . 1 
3207 3468 
6620 6744 
7791 8167 
9131 10117 
CDS GB:HSU43842_1613615CD1 
exon 7798 8167 
9131 9984 
GB:HSU43842 
source 
consequence 
GB:HSU43 842 
source 



exon 
exon 
exon 
exon 



exon 
Allele 



Allele 



1751 bp 



1224 bp 



7798 9984 
10117 #51 
4 exons 



2 exons 



#50 



#51 



#50 



49 



A>G 



Allele 



consequence 
GB:HSU43 842 



source 
source 
consequence 
A>V 

GIF BMP4-genomic-f wd . gif 



6665 6665 
isSNP SNP00074676 
GB:HSU43842_1613615CD1 
49 7752 7752 A>G 

isSNP SNP00117542 
GB:HSU43 842_1613 615CD1 
49 9215 9215 A>G 

isSNP SNP00000573 

wetSNP GB : HSU43 842 .v9215 . OT 



50 



50 



GB:HSU43842_JL613615CD1 



50 



Missense 



152-152 



BMP 6 

Full name : BMP 6 
Link : BMP6_link_cdna 

Subsequence GB : HUMTGFBC 1 

CDS GB: HUMTGFBC. 1 1542 bp 

160 1701 
GB: HUMTGFBC 
source 
consequence 
GB: HUMTGFBC 
source 
consequence 
GB: HUMTGFBC 
source 
consequence 
GB : HUMTGFBC 
source 
consequence 
GIF BMP6-cdna-fwd.gif 



ORF 
Allele 



Allele 



Allele 



Allele 



2923 #52 
#53 



52 1263 1263 OQ 

isSNP SNP00069306 

GB : HUMTGFBC. 1 53 

52 2280 2280 G>T 

isSNP SNP00021640 

GB: HUMTGFBC. 1 53 

52 2436 2436 A>G 

isSNP SNP00003240 

GB : HUMTGFBC. 1 53 

52 2574 2574 A>G 

isSNP SNP00021639 

GB : HUMTGFBC . 1 53 



Silent 



368-368 



V 



CAPN4 

Full name : calpain, small polypeptide 

Link : FL_508926_link_genomic 

Subsequence GB : CH19F24590 1 41369 

Subsequence GB : CH19F24590„3 639962CD1 

Subs equenc e FL_3 6399 6 2__mrna_buil d . 1 3 0073 

Subsequence CAPN4_cds . 1 31006 39833 #57 

mRNA FL_3639962_mrna_build.l 1309 bp 

156 



#54 

31006 

40241 



39830 
#56 



11 exons 



#55 



#56 



WO 03/054166 



PCT/US02/41225 
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exon 


30073 


30151 


exon 


30991 


31214 


exon 


32294 


32327 


exon 


32646 


32735 


exon 


32903 


32960 


exon 


33058 


33122 


exon 


35800 


35868 


exon 


35970 


36048 


exon 


36190 


36306 


exon 


39572 


39630 


exon 


39807 


40241 


CAPN4_ 


_cds . 1 


717 bp 


exon 


31006 


31214 


exon 


32294 


32327 


exon 


32903 


32960 


exon 


33058 


33122 


exon 


35800 


35868 


exon 


35970 


36048 


exon 


36190 


36306 


exon 


39572 


39630 


exon 


39807 


39833 



CDS GB:CH19F24590_3 639 962CD1 804 bp 10 exons #55 



exon 


31006 


31214 


exon 


32294 


32327 


exon 


32646 


32735 


exon 


32903 


32960 


exon 


33058 


33122 


exon 


35800 


35868 


exon 


35970 


36048 


exon 


36190 


36306 


exon 


39572 


39630 


exon 


39807 


39830 



GIF CAPN4-genomic-fwd.gif 



CBFAl 

Full name : CBFAl 
Link : CBFAl__link_cdna 

Subsequence GB : HUMCBFA 1 

CDS GB: HUMCBFA. 2 1323 bp 

ORF 1 1323 
Allele GB: HUMCBFA 58 260 

source isSNP SNP000 

consequence GB : HUMCBFA . 2 
GIF CBFAl-cdna-fwd.gif 
Link : CBFAl_link_genomic 



1411 



260 
63798 



#58 
#59 

A>G 

59 



Missense 



87-87 G>E 



Subsequence 


GB 


:HSCBFA1S1 


1 


93 


#60 


Subsequence 


GB 


:HSCBFA1S2 


194 


669 


#61 


Subsequence 


GB 


:HSCBFA1S3 


770 


1034 


#62 


Subsequence 


GB 


:HSCBFA1S4 


1135 


1381 


#63 


Subsequence 


GB 


:HSCBFA1S5 


1482 


1759 


#64 


Subsequence 


GB 


:HSCBFA1S6 


1860 


2081 


#65 


Subsequence 


GB 


:HSCBFA1S7 


2182 


2301 


#66 


Subsequence 


GB 


:HSCBFA1S8 


2402 
157 


3033 


#67 



WO 03/054166 
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CBFAl__cds.l 28 2948 #68 

1566 bp 8 exons #68 

85 
625 
977 
1302 
1706 
2042 
2266 
2948 

GB:HSCBFA1S3 62 
source wetSNP 
consequence CBFAl_cds 
GB:HSCBFA1S8 67 
source wetSNP 

consequence CBFAl_cds . 1 68 Silent 
GIF CBFAl-genomic-fwd. gif 



Subsequence 
CDS CBFAl_cds 

exon 2 8 

exon 

exon 

exon 

exon 

exon 

exon 

exon 
Allele 



Allele 



261 
821 
1198 
1533 
1881 
2201 
2470 



177 177 A>G 
GB:HSCBFA1S3 .vl77 . C>T 
68 Silent 183-183 

490 490 A>G 
GB:HSCBFA1S8 .v490.C>T 

503-503 



N 



CD36 

Full name : CD3 6 Glycoprotein 
Link : CD36_link_cdna 



Subsequence EM:HSCD3621 


1 2216 


#69 


Allele 


EM:HSCD3621 


69 


123 123 


G>T 




source 


isSNP 


SNP00011023 




Allele 


EM:HSCD3 621 


69 


196 196 


A>G 




source 


isSNP 


SNP00096573 




Allele 


EM:HSCD3621 


69 


230 230 


OG 




source 


isSNP 


SNP00110263 




Allele 


EM:HSCD3621 


69 


827 827 


A>G 




source 


isSNP 


SNP00115780 




Allele 


EM:HSCD3 621 


69 


1332 1332 


A>G 




source 


isSNP 


SNP00096574 





Link : CD36_link_genomic 



Subsequence 


CD36_link_cds . 1 


2094 


6548 


#70 


Subsequence 


EM 


:HSCD36G1 


101 


236 


#71 




Subsequence 


EM 


:HSCD36A 


338 


2898 


#72 




Subsequence 


EM 


:HSCD36G4 


3000 


3220 


#73 




Subsequence 


EM 


:HSCD3 6G5 


3322 


3529 


#74 




Subsequence 


EM 


: HSCD3 6 AA 


3631 


3999 


#75 




Subsequence 


EM 


:HSCD3 6G7 


4101 


4252 


#76 




Subsequence 


EM 


:HSCD3 6G8 


4354 


4460 


#77 




Subsequence 


EM 


:HSCD3 6G9 


4562 


4691 


#78 




Subsequence 


EM 


:HSCD3 6G10 




4793 


5042 


#79 


Subsequence 


EM 


:B74110 


5144 


5803 


#80 




Subsequence 


EM 


:HSCD3 6G12 




5905 


6038 


#81 


Subsequence 


EM 


:HSCD3 6G13 




6140 


6252 


#82 


Subsequence 


EM 


:HSCD3 6G14 




6354 


6847 


#83 


Subsequence 


EM 


:HSCD3 6G15 




6949 


7632 


#84 


Subsequence 


CD3 6 _mrn a___bu i 1 d . 1 


136 


7602 


#85 


mRNA CD36. 


_mrna_build . 1 


2217 


bp 


16 exons 



exon 13 6 206 
exon 1446 1539 
exon 2005 2213 
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exon 


3030 


3190 


exon 


3352 


3499 


exon 


3719 


3898 


exon 


4131 


4222 


exon 


4384 


4430 


exon 


4592 


4661 


exon 


4824 


5011 


exon 


5265 


5383 


exon 


5935 


6008 


exon 


6168 


6222 


exon 


6384 


6548 


exon 


6979 


7071 


exon 


7152 


7602 


CD36_ 


_link_cds . 1 


exon 


2094 


2213 


exon 


3030 


3190 


exon 


3352 


3499 


exon 


3719 


3898 


exon 


4131 


4222 


exon 


4384 


4430 


exon 


4592 


4661 


exon 


4824 


5011 


exon 


5265 


5383 


exon 


5935 


6008 


exon 


6168 


6222 


exon 


6384 


6548 



1419 bp 



12 exons 



#70 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



72 1160 1160 G>T 

isSNP SNP00011023 

CD3 6_link_cds.l 70 

72 1698 1698 A>G 

isSNP SNP00096573 

CD36_link_cds.l 70 

72 1732 1732 OG 
isSNP SNP00110263 
CDS 6_JLink_cds .1 70 

73 102 102 OG 



wetSNP 



EM:HSCD36A 
source 
consequence 
EM:HSCD36A 
source 
consequence 
EM:HSCD36A 
source 
consequence 
EM:HSCD36G4 
source 
consequence 
EM: HSCD36AA 
source 
consequence 
EM:HSCD36G10 
source 
consequence 
EM:B74110 
source 
consequence 
EM:HSCD36G14 
source 

consequence CD3 6_1 ink_cds . 1 
EM:HSCD36G14 83 421 

source isSNP SNP00041723 

consequence CD3 6„link__cds . 1 
GIF CD3 6 -genomi c - f wd . gi f 



EM:HSCD36G4.vl02 .G>C 



70 
A>G 

70 
92 



Missense 



Silent 
A>G 



CD3 6_link_cds . 1 
75 232 232 

isSNP SNP00115780 
CD3 6_link_cds . 1 
79 92 

wetSNP EM:HSCD3 6G10 .v92 .T>C 

CD3 6_link_cds.l 70 Silent 
80 193 193 

isSNP SNP00096574 
CDS 6_JLink_cds . 1 
83 19 8 

wetSNP 



64-64 Q>H 



191-191 



293-293 



A>G 



70 
203 



Silent 360-360 
AAGTAT>AT 
EM:HSCD36G14 . vl 9 8 . AAGTAT>AT 
70 3' 
421 A>G 



70 
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TABLE 1 (Cont.) 



CD68 

Full name : CDS 8 antigen 
Link : FL_3777141_link_cdna 

Subsequence FN: 3777141CB1 

CDS FN:3777141CB1.1 1065 bp 



ORF 
Allele 



Allele 



75 1139 

FN:3777141CB1 86 834 

source isSNP SNP00006442 

consequence FN: 3777141CB1 . 1 
FN:3777141CB1 86 1394 

source 



1558 
#87 

834 

87 

1394 



#86 



G>T 



Missense 
G>T 



254-254 



Q>K 



dbSNP gnl|dbSNP|ss450666_allele 
consequence FN: 3777141CB1 . 1 87 3' 
Allele FN:3777141CB1 86 1475 1475 G>T 

source isSNP SNP00108664 

consequence FN: 3777141CB1 . 1 87 3' 
GIF CD6 8-cdna- f wd . gi f 
Link : FL_1 8 0 3 9 2 9_1 ink_genomi c 

Subsequence GB : AC007421_12 1 95240 #88 

Subsequence GB : AC007421__12_377714lCDl 92493 90660 

Subsequence FL_3 77714 l_mrna_bu i 1 d . 1 92567 90242 #90 



rnRNA FL_3777141_mrna_build . 1 

92567 92445 
91844 
91586 
91388 
91105 
90242 



1557 bp 



6 exons 



#89 
#90 



exon 
exon 
exon 
exon 
exon 
exon 



92361 
91705 
91460 
91275 
90793 

CDS GB:AC007421_12_3777141CD1 

exon 92493 92445 

92361 91844 

91705 91586 

91460 91388 

91275 91105 

90793 90660 



1065 bp 



6 exons 



#89 



exon 
exon 
exon 
exon 
exon 
Allele 



GB: AC007421_12 



88 



90404 90404 G>T 



Allele 



340-340 
Allele 



254-254 
Allele 



dbSNP gnl |dbSNP| ss450666„allele 
GB:AC007421_12_3777141CD1 89 3' 

90707 90707 A>G 
GB:AC007421_12 .v90707 . OT 
GB:AC007421__12__3777141CDl 89 Missense 



source 
consequence 
GB:AC007421_12 88 
source wetSNP 
consequence 
A>T 

GB:AC0 07421_12 88 91388 91388 G>T 

source wetSNP GB : AC0 07 42 1_12 . v9 13 8 8 . G>T 

consequence GB : AC007421_12_3777141CD1 89 Missense 

Q>K 

GB:AC007421_12 88 92357 92357 A>G 

source wetSNP GB : ACQ 07 42 1„1 2 . v9 2 3 5 7 . C>T 

consequence GB : AC0 0742 1_12_3 777 141CD1 89 Silent 



18-18 Q 

GIF CD68-genoinic-rev.gif 



CDOl 



160 
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TABLE 1 (Cont.) 



Full name : cysteine dioxygenase type I 

Link : CDOl_link_cdna 

Subsequence GB:HHSCYSDIO 1 

CDS GB:HHSCYSDIO,l 603 bp 



1556 #91 
#92 



255 857 

GB: HHSCYSDIO 91 100 

source isSNP SNP00009024 

consequence GB : HHSCYSDIO . 1 
GB:HHSCYSDIO 91 737 

source isSNP SNP00048574 

consequence GB : HHSCYSDIO . 1 
GB: HHSCYSDIO 91 784 

source isSNP SNP0003 6 859 

consequence GB : HHSCYSDIO . 1 
GB: HHSCYSDIO 91 1082 

source isSNP SNP00107326 

consequence GB : HHSCYSDIO . 1 
GB: HHSCYSDIO 91 1525 

source isSNP SNP00036860 

consequence GB : HHSCYSDIO . 1 
GIF CDOl-cdna-fwd.gif 
Link : CD01_link_genomic 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



100 

92 
737 

92 
784 



A>G 

5' 
A>G 

Silent 
A>G 



92 Missense 

1082 A>G 

92 3 ' 

1525 A>G 

92 3' 



161-161 



177-177 



V>A 



Subsequence 


CDO!_cds . 1 


1653 


4275 


#93 


Subsequence 


GB:D85778_ 


1 


1 


2601 


#94 


Subsequent 


a 


GB:D85779_ 


_1 


2702 


2938 


#95 


Subsequent 


e 


GB:D85780_ 


.1 


3039 


3525 


#96 


Subsequent 


e 


GB:D85781_ 


J. 


3626 


4090 


#97 


Subsequenc 


e 


GB:D85782_ 


_1 


4191 


4921 


#98 


Subsequenc 


e 


CDOl_mrna_ 


.build. 1 


1402 


4921 #9 


mRNA 


CDOl„mrna_bu ild . 


1 


1500 


bp 


5 exons 


exon 


1402 


1822 










exon 


2789 


2866 










exon 


3178 


3332 










exon 


3777 


3946 










exon 


4246 


4921 










CDS CDOl_cds . 1 


603 bp 




5 exons 


#93 


exon 


1653 


1822 










exon 


2789 


2866 










exon 


3178 


3332 










exon 


3777 


3946 










exon 


4246 


4275 










Allele 


GB:D85778_JL 94 




1498 


1498 


A>G 



#99 



source isSNP SNP00009 024 

consequence CDOl_cds . 1 93 5' 

Allele GB:D85781_1 97 278 278 A>G 

source isSNP SNP00036859 

consequence CDOl_cds . 1 93 Missense 

Allele GB:D85782_1 98 310 310 A>G 

source isSNP SNP00107326 

consequence CDOl__cds . 1 93 3' 

GIF CDOl-genomic-fwd.gif 



177-177 



V>A 



CGI-52 
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TABLE 1 (Cont.) 



Link : CGI - 5 2_1 ink_cdna 

Subsequence GB:AF151810 1 1414 #100 

CDS GB:AF151810.1 1080 bp #101 

ORF 277 1356 
Allele GB:AF151810 100 1335 1335 A>G 

source isSNP SNP00054191 

consequence GB : AF1 51810.1 101 
GIF CGI - 5 2 -cdna- f wd . gi f 
Link : CGI-52_link_genomic 

Subsequence GB : AC023176J7 1 193672 

Subsequence CGI-52_mrna_build . 1 131456 



Silent 



#102 
93050 



mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



CGI - 5 2_jmrna_bu i 1 d . 1 



142 0 bp 



7 exons 



353-353 



#103 
#103 



131456 
119505 
97592 97445 



131084 
119186 



96844 
96095 
93964 
93353 



96741 
95978 
93912 
93050 



GB:AC023176_7 102 93129 

source isSNP SNP00054191 

Allele GB:AC023176_7 102 93416 

source isSNP SNP00057212 

Allele GB:AC023176_7 102 131305 

source isSNP SNP00069496 

GIF CGI-52-genomic-rev.gif 



93129 A>G 



93416 A>G 



131305 



OG 



CHI3L1 

Full name : chitinase 3 -like 1 
Link : CHI3Ll_link_cdna 

Subsequence GB : NM_0 0127 6_1 1 

CDS GB:NM„001276_1.1 1152 bp 
ORF 127 1278 

Allele GB:NM_001276„1 104 559 

source isSNP SNP00008252 

consequence GB :NM__001276_1 . 1 

Allele GB:NM_001276_1 104 590 

source isSNP SNP00071935 

cons equence GB : NM_0 0127 6_1 . 1 

Allele GB:NM_001276.JL 104 646 

source isSNP SNP00022932 

consequence GB : NM__0 0127 6_1 . 1 

Allele GB:NM„001276_1 104 1300 

source isSNP SNP00052666 

consequence GB : NM_0 0127 6_1 . 1 

Allele GB : NM_0 0127 6_1 104 1342 

source isSNP SNP00072805 

consequence GB : NM„00127 6_1 . 1 

Allele GB:1SIM_001276_1 104 1739 

source isSNP SNP0 0076 6 86 

consequence GB : NM_00127 6_1 . 1 

GIF CHI3Ll-cdna-fwd.gif 
Link : CHI3Ll_link_genomic 

162 



1925 
#105 

559 

105 
590 

105 
646 

105 
1300 

105 
1342 

105 
1739 

105 



#104 



A>G 

Missense 
A>G 

Missense 
G>T 

Missense 
A>G 

3 ' 
A>G 

3' 
A>G 



145-145 



155-155 



174-174 



R>G 



K>R 



L>I 
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TABLE 1 (Cont.) 



Subsequence 


CHI3Ll_cds.l 




1295 


7276 


#106 




Subsequence 


CHl3Ll_cds.2 




1295 


7433 


#107 




Subsequence 


CHI3Ll_cds.3 




1295 


7276 


#108 




Subsequence 


CHI3Ll_cds.4 




1295 


2802 


#109 




Subsequence 


GB:Y08374_1 


1 


1635 


#110 






Subsequence 


GB:Y08375__1 


1736 


3186 


#111 






Subsequence 


GB:Y08376_1 


3287 


4116 


#112 






Subsequence 


GB:Y08377_1 


4217 


5035 


#113 






Subsequence 


GB:Y08378_1 


5136 


7923 


#114 






Subsequence 


CHI 3 Ll_mrna_ 


build. 


1 


1169 


7923 


#115 


Subsequence 


CHI3Ll_rtirna_ 


build. 


2 


1169 


7604 


#116 


mRNA CHI3L1 


_mrna_bui Id . \ 


2 


1355 


bp 


11 exons 



#116 



exon 


1169 


1319 


exon 


1572 


1601 


exon 


2036 


2237 


exon 


2789 


2845 


exon 


3606 


3756 


exon 


4517 


4638 


exon 


5436 


5559 


exon 


6069 


6251 


exon 


6844 


6960 


exon 


7296 


7456 


exon 


7548 


7604 


CHI3L1 


__cds . 


1 


exon 


1295 


1319 


exon 


1572 


1601 


exon 


2036 


2237 


exon 


2789 


2845 


exon 


3606 


3756 


exon 


4517 


4638 


exon 


5436 


5559 


exon 


6069 


6251 


exon 


6844 


6960 


exon 


7136 


7276 


CHI3L1 


_cds . 


2 


exon 


1295 


1319 


exon 


1572 


1601 


exon 


2036 


2237 


exon 


2789 


2845 


exon 


3606 


3756 


exon 


4517 


4638 


exon 


5436 


5559 


exon 


6069 


6251 


exon 


6844 


6960 


exon 


7296 


7433 


CHI3L1 


__cds . 


3 


exon 


1295 


1319 


exon 


1572 


1601 


exon 


2036 


2237 


exon 


2789 


2845 


exon 


3606 


3756 


exon 


4517 


4638 


exon 


5436 


5559 


exon 


6844 


6960 


exon 


7136 


7276 



1152 bp 



10 exons 



#106 



1149 bp 



10 exons 



#107 



969 bp 



9 exons 



#108 



163 
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mRNA 





CHI3L1. 


_mrna_ 


Jbuild.l 


exon 


1169 


1319 




exon 


1572 


1601 




exon 


2036 


2237 




exon 


2789 


2845 




exon 


3606 


3756 




exon 


4517 


4638 




exon 


5436 


5559 




exon 


6069 


6251 




exon 


6844 


6960 




exon 


7136 


7923 




CHI3L1 


_cds . 4 




69 bp : 


exon 


1295 


1319 




exon 


1572 


1601 




exon 


2789 


2802 





TABLE 1 (Cont.) 

192 5 bp 10 exons 



#115 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:Y08376_1 

source 

consequence 

consequence 

consequence 

consequence 

GB:Y08376__1 

source 

consequence 

consequence 

consequence 

consequence 

GB:Y08377_1 

source 

consequence 

consequence 

consequence 

consequence 

GB:Y08378_1 

source 

consequence 

consequence 

consequence 

consequence 

GB:Y08378_1 

source 

consequence 

consequence 

consequence 

consequence 

GB:Y08378_1 

source 

consequence 

consequence 

consequence 

consequence 

GB:Y08378_1 

source 

consequence 

consequence 



.2 

.3 
.4 



438 



.2 
.3 
.4 



355 



112 311 311 
isSNP SNP00071934 
CHI3Ll_cds.l 
CHl3Ll__cds. 
CHI3Ll_cds. 
CHI3Ll_cds. 

112 438 
isSNP SNP00008252 
CHI3Ll_cds.l 
CHI3Ll_cds. 
CHI3Ll_cds. 
CHI3Ll_cds. 

113 355 
isSNP SNP00022932 
CHI3Ll_cds.l 
CHl3Ll_cds.2 
CHI3Ll_cds.3 
CHI3Ll_cds.4 

114 506 506 
isSNP SNP00005491 
CHI3Ll_cds.l 
CHl3L»l_cds.2 
CHl3Ll_cds.3 
CHI3Ll_cds.4 

114 535 535 

isSNP SNP00005492 

CHI3Ll_cds.l 

CHI3L»l_cds.2 

CHl3Ll_cds.3 

CHl3Ll_cds.4 

114 641 641 

isSNP SNP00028111 

CHI3Ll_cds.l 

CHI3Ll_cds.2 

CHl3Ll„cds.3 

CHI3Ll_cds.4 

114 1560 1560 

isSNP SNP00028112 

CHI3Ll_cds.l 

CHl3Ll_cds.2 

164 



#109 



G>T 

106 
107 
108 
109 
A>G 

106 
107 
108 
109 
G>T 

106 
107 
108 
109 
A>G 

106 
107 
108 
109 
A>G 

106 
107 
108 
109 
A>G 

106 
107 
108 
109 
A>G 

106 
107 



Intron 
Intron 
Intron 
3' 



Missense 
Missense 
Missense 
3' 



Missense 
Missense 
Missense 
3' 



Intron 
Intron 
Intron 
3 ' 



Intron 
Intron 
Intron 
3 ' 



Intron 
Intron 
Intron 
3' 



Intron 
Intron 



145-145 
145-145 
145-145 



174-174 
174-174 
174-174 



R>G 
R>G 
R>G 



L>I 
L>I 
L>I 
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consequence CHI3Ll_cds.3 108 

consequence CHI3Ll__cds . 4 109 

Allele GB:Y08378_1 114 2163 2163 A>G 

source isSNP SNP00052666 

consequence CHI3Ll_cds.l 106 

consequence CHI3Ll_cds . 2 107 

consequence CHI3Ll_cds.3 108 

consequence CHI3Ll__cds . 4 109 

Allele GB:Y08378„1 114 2205 2205 A>G 

source isSNP SNP00072805 

consequence CHI3Ll_cds.l 106 

consequence CHI3Ll_cds . 2 107 

consequence CHI3Ll_cds.3 108 

consequence CHI3Ll_cds.4 109 

Allele GB:Y08378_1 114 2602 2602 A>G 

source isSNP SNP00076686 

consequence CHI3Ll_cds.l 106 

consequence CHI3Ll_cds . 2 107 

consequence CHI3Ll_cds . 3 108 

consequence CHl3Ll__cds . 4 109 

GIF CHI3Ll-genomic-fwd.gif 



Intron 
3 ' 



3' 

Silent 

3' 

3' 



3 ' 

Silent 

3' 

3' 



3 ' 
3 ' 
3' 
3' 



338-338 



H 



352-352 



CHI3L2 

Full name : chitinase 3-like 2 
Link : CHI3L2_link_cdna 

Subsequence GB:HSU58514 1 



1434 



CDS GB:HSU58514.1 



1173 bp 



37 1209 
GB:HSU58514 
source 
consequence 
GB:HSU58514 
source 
consequence 
GB:HSU58514 
source 
consequence 
GB:HSU58514 
source 
consequence 
GIF CHI3L2-cdna-fwd.gif 
Link : CHI3L2_alt_link_cdna 
Sub s equ en c e GB : U5 8 5 1 5_1 



ORF 
Allele 



Allele 



Allele 



Allele 



117 412 412 
isSNP SNP00021152 
GB:HSU58514.1 
117 581 581 
isSNP SNP00021153 
GB:HSU58514.1 
117 972 972 
isSNP SNP00115597 
GB:HSU58514.1 
117 1204 1204 
isSNP SNP00068229 
GB:HSU58514.1 



CDS GB:U58515_1.1 

ORF 
Allele 



1275 bp 



1 1275 
GB:U58515_1 
source 



1500 



478 



Allele 



Allele 



119 478 
isSNP SNP00021152 
consequence GB : U5 851 5_1 . 1 
GB:U58515_1 119 647 647 

isSNP SNP00021153 
GB:U58515_1.1 
119 1038 1038 



source 
consequence 
GB:U58515_1 
source 



#117 
#118 

A>G 

118 
A>G 

118 
A>G 

118 
A>G 

118 



#119 
#120 

A>G 

120 
A>G 

120 
A>G 



Missense 



Missense 



Silent 



Silent 



126-126 



182-182 



312-312 



390-390 



N>D 



A>V 



K 



Missense 



Missense 



160-160 



216-216 



N>D 



A>V 



isSNP SNP0011^97 
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consequence GB : U5 851 5_1 . 1 120 Silent 346-346 K 

Allele GB:U58515_1 119 1270 1270 A>G 

source isSNP SNP00068229 

consequence GB :U58515_1 . 1 120 Silent 424-424 L 

GIF CHI3L2-cdna-fwd.gif 



CILP 

Full name : cartilage intermediate layer protein 
Link : CILP_link_cdna 

Subsequence GB:AF03 5408 1 4175 



CDS GB:AF035408.1 



3555 bp 



ORF 
Allele 



Allele 



Allele 



130 3684 

GB:AF03 5408 

source 

consequence 

GB:AF035408 

source 

consequence 

GB:AF035408 



430 



430 



source 
consequence 

Allele GB:AF035408 
source 
consequence 

Allele GB:AF035408 
source 
consequence 

GIF CILP-cdna-fwd.gif 
Link : CILP_link_genomic 

Sub s equen c e 

Subsequence 

Subsequence 

CDS CILP_cds.l 
exon 3 606 



121 

isSNP SNP00123071 
GB:AF035408.1 
121 1677 1677 
isSNP SNP00123072 
GB:AF035408.1 
121 3066 3066 
isSNP SNP00020276 
GB:AF035408.1 
121 3263 3263 
isSNP SNP00123073 
GB:AF035408.1 
121 3625 3625 
isSNP SNP00055164 
GB: AF035408.1 



exon 
exon 
exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



CILP_cds.l 3606 16639 
GB:AB022430_1 1 
C I L P__mr n a_bu i 1 d .1 1911 
3 555 bp 8 exons 

3666 
5599 5691 
6312 6581 
7897 8076 
8781 9095 
9893 10001 
11336 11493 
14271 16639 

CILP_mrna_build.l 4175 bp 



1911 

3500 

5599 

6312 

7897 

8781 

9893 

11336 

14271 



1933 

3666 

5691 

6581 

8076 

9095 

10001 

11493 

17130 



#121 
#122 

A>G 

122 
A>G 

122 
A>G 

122 
A>G 

122 
A>G 

122 



#123 
19486 
17130 
#123 



Missense 



Silent 



Silent 



Missense 



Missense 



#124 
#125 



101-101 



516-516 



979-979 



1045-1045 



1166-1166 



P>S 



Y>C 



S>G 



9 exons 



#125 



GB:AB02243 0„1 



source 



124 3567 3567 G>T 
wetSNP GBrAB022430_l .v3567 .A>C 
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TABLE 1 (Cont.) 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GIF CILP- 



T>I 



R>T 



consequence CILP_cds . 1 123 5' 
GB:AB022430_1 124 6458 6458 A>G 

source isSNP SNP00123071 

consequence CILP_cds . 1 123 Missense 101-101 F>S 

GB:AB022430_1 124 9874 

source wetSNP 
consequence CILP_cds . 1 
GB : AB 02243 0__1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430„1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB02243 0_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 - 

source isSNP 
consequence CILP_cds - 1 
GB:AB022430_JL 124 
source wetSNP 

consequence CILP_cds . 1 123 Missense 678-678 V>M 

GB:AB022430_1 124 
source wetSNP 

consequence CILP_cds . 1 123 Silent 
GB:AB022430__1 124 16021 16021 A>G 

source isSNP SNP00020276 

consequence CILP_cds . 1 123 Silent 
GB:AB022430_1 124 16218 16218 A>G 

source isSNP SNP00123073 

consequence CILP_cds . 1 123 Missense 1045-1045 Y>C 

GB:AB022430__1 124 16580 16580 A>G 

source isSNP SNP00055164 

source wetSNP GB : AB 022 43 0_1 . vl 6580 ,A>G 

consequence CILP_cds . 1 123 Missense 1166-1166 S>G 

genomic- fwd . gif 



Missense 101-101 
9 874 A>G 
GB : AB02243 0_1 .v9874 . C>T 
123 Intron 
9881 9881 A>G 
GB : AB 02243 0_1 . v9881 . C>T 
123 Intron 
11286 11286 A>T 
GB : AB 02243 0_1 .vll286 . T>A 
123 Intron 
11491 11491 A>G 
GB:AB022430_l.vll491.C>T 
123 Missense 395-395 ■ 

14421 14421 C>G 
GB:AB022430_l.vl4421 .G>C 
123 Missense 446-446 
14542 14542 A>G 
GB : ABO 2243 0_1 . vl4542 . G>A 
123 Silent 486-486 
14632 14632 A>G 
SNP00123072 

123 Silent 516-516 
15116 15116 A>G 
GB : AB022430_1 . vl5116 . G>A 
123 Missense 678-678 
15670 15670 A>G 
GB:AB022430_l.vl5670 .G>A 

862-862 



979-979 



T 



R 



COL10A1 

Full name : collagen, type X, alpha 1 
Link : COL10Al_link_cdna 

Subsequence GB:X603 82_1 1 



3226 



CDS GB:X60382_1.2 



2043 bp 



ORF 
Allele 



Allele 



16 2058 
GB:X60382_1 
source 
consequence 
GB:X603 82_1 



126 



95 



95 



isSNP SNP00034488 

GB:X603 82_1 .2 

126 2294 2294 
167 



#126 
#127 

A>G 

127 
G>T 



Missense 



27-27 T>M 
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source isSNP SNP00113056 

consequence GB :X603 82_1 . 2 127 3' 

GIF COL10Al-cdna-fwd.gif 



COL11A2 

Full name : collagen, type XI/ alpha 2 
Link : FL_3421462_link_genomic 

Subsequence GB : AL031228_1 1 

Subsequence COLllA2_cds . 1 93988 

Subsequence COLllA2_cds . 2 93988 

Subsequence COLllA2_cds . 3 939 88 

Subsequence COLllA2_cds . 4 93988 

Subsequence COLllA2_cds . 5 93988 

Subsequence CQLllA2_cds . 6 93 9 88 

Subsequence COLllA2_cds . 7 93 988 

Subsequence COLllA2_cds . 8 93 9 88 

Subsequence COLllA2_mrna_Jbuild. 1 

Subsequence COLllA2_mrna_build . 2 

Subsequence GB : AL03 1228_1 .20 93762 

Subsequence GB : AL031228_1 .21 ' 939 88 

Subsequence COLllA2_mrna_build . 3 

mRNA GB:AL031228_1.20 6423 bp 





GB:AL031228_ 


.1.20 


exon 


93762 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


100450 


100527 


exon 


101174 


101236 


exon 


101904 


102083 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 



175737 
122550 
122550 
122550 
122550 
122550 
122550 
122550 
122550 

93988 122834 
93988 122834 
123536 
122550 

93769 125002 
6 6 exons 



#128 
#129 
#130 
#131 
#132 
#133 
#134 
#135 
#136 



#139 
#140 

#139 



#137 
#138 



#141 
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exon 112010 

exon 112173 

exon 112302 

exon 112483 

exon 112673 

exon 112827 

exon 113115 

exon 113591 

exon 113850 

exon 114125 

exon 114408 

exon 114654 

exon 114904 

exon 115061 

exon 115311 

exon 115618 

exon 115849 

exon 116128 

exon 116344 

exon 116738 

exon 117220 

exon 117469 

exon 117656 

exon 118376 

exon 118695 

exon 118911 

exon 119105 

exon 119401 

exon 119 662 

exon 12 0022 

exon 12 0244 

exon 12 0412 

exon 121264 

exon 121755 

exon 122410 

inRNA COLllA2_mrna 

exon 93769 94341 

exon 96759 96908 

exon 97040 97250 

exon 97704 97866 

exon 99410 99601 

exon 101174 

exon 101904 

exon 105058 

exon 105223 

exon 10549 8 

exon 10589 6 

exon 106423 

exon 106741 

exon 106944 

exon 107102 

exon 107255 

exon 107496 

exon 107740 

exon 107 876 



112063 
112217 
112355 
112527 
112726 
112880 
113168 
113698 
113939 
114178 
114515 
114761 
114957 
115114 
115418 
115671 
115902 
116181 
116397 
116845 
117273 
117522 
117709 
118429 
118802 
118964 
119158 
119508 
119715 
120057 
120297 
120679 
121376 
121961 
123536 
_build. 3 



101236 
102083 
105117 
105264 
105560 
105970 
106509 
106797 
106997 
107155 
107308 
107549 
107793 
107920 



6780 bp 



66 exons 



#141 
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exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112577 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116196 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122183 


122332 


exon 


122410 


123530 


exon 


124988 


125002 



CDS COLllA2_cds.6 5157 bp 65 exons #134 

exon 93988 94069 

exon 96759 96908 

exon 97040 97250 

exon 97704 97866 

exon 99410 99601 

exon 100450 100527 

exon 101174 101236 
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exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 



101904 

105058 

105223 

105498 

105896 

106423 

106741 

106944 

107102 

107255 

107496 

107740 

107876 

108043 

108522 

108763 

109003 

109183 

109463 

109742 

109925 

110159 

110547 

111648 

112010 

112173 

112302 

112483 

112673 

112827 

113115 

113591 

113850 

114125 

114408 

114654 

114904 

115061 

115311 

115618 

115849 

116128 

116344 

116738 

117220 

117469 

117656 

118376 

118695 

118911 

119105 

119401 

120022 

120244 

120412 



102083 

105117 

105264 

105560 

105970 

106509 

106797 

106997 

107155 

107308 

107549 

107793 

107920 

108096 

108566 

108816 

109047 

109236 

109507 

109795 

109969 

110212 

110654 

111701 

112063 

112217 

112355 

112527 

112726 

112880 

113168 

113698 

113939 

114178 

114515 

114761 

114957 

115114 

115418 

115671 

115902 

116181 

116397 

116845 

117273 

117522 

117709 

118429 

118802 

118964 

119158 

119508 

120057 

120297 

120679 
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exon 121264 121376 

exon 121755 121961 

exon 122410 122550 

GB:AL031228_1.21 5211 bp 66 exons #140 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


100450 


100527 


exon 


101174 


101236 


exon 


101904 


102083 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507- 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 
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exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



COLllA2_cds.7 5049 bp 64 exons #135 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


101174 


101236 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 
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exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122183 


122332 


exon 


122410 


122550 



CDS COLllA2_cds.8 4986 bp 63 exons #136 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 
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exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122183 


122332 


exon 


122410 


122550 



CDS COLllA2_cds.l 4890 bp 63 exons #129 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 
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exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



CDS COLllA2_cds.2 4953 bp 64 exons #13 0 



exon 


93988 


94069 




exon 


96759 


96908 




exon 


97040 


97250 




exon 


97704 


97866 




exon 


99410 


99601 




exon 


101174 




101236 


exon 


105058 




105117 


exon 


105223 




105264 


exon 


105498 




105560 


exon 


105896 




105970 


exon 


106423 




106509 
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exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



CDS COLllA2_cds.3 53 07 bp 6 6 exons #131 

exon 93988 94069 

177 
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TABLE 1 (Cont.) 



exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


100450 


100527 


exon 


101174 


101236 


exon 


101904 


102083 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 
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TABLE 1 (Cont.) 

exon 118911 118964 

exon 119105 119158 

exon 119401 119508 

exon 120022 120057 

exon 120244 120297 

exon 120412 120679 

exon 121264 121376 

exon 121755 121961 

exon 122183 122332 

exon 122410 122550 

iriRNA COLllA2„imrna_build. 1 5174 bp 

exon 93988 94069 

exon 96759 96908 

exon 97040 97250 

exon 97704 97866 

exon 99410 99601 



exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 . 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 
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63 exons #137 
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TABLE 1 (Cont.) 



exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122834 



CDS COLllA2_cds.4 483 6 bp 62 exons #132 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 
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TABLE 1 (Cont.) 



exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

mRNA 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 



113591 

113850 

114125 

114408 

114654 

114904 

115061 

115311 

115618 

115849 

116128 

116344 

116738 

117220 

117469 

117656 

118376 

118695 

118911 

119105 

119401 

120022 

120244 

120412 

121264 

121755 

122410 



113698 

113939 

114178 

114515 

114761 

114957 

115114 

115418 

115671 

115902 

116181 

116397 

116845 

117273 

117522 

117709 

118429 

118802 

118964 

119158 

119508 

120057 

120297 

120679 

121376 

121961 

122550 



COLl 1 A2_mrna_bui Id . 2 
93988 94069 
96759 96908 
97040 97250 
97704 97866 
99410 99601 

101174 101236 

105058 105117 

105223 105264 

105498 105560 

105896 105970 

106423 106509 

106741 106797 

106944 106997 

107102 107155 

107255 107308 

107496 107549 

107740 107793 

107876 107920 

108043 108096 

108522 108566 

108763 * 108816 

109003 109047 

109183 109236 

109463 109507 

109742 109795 

109925 109969 

110159 110212 



5237 bp 



64 exons 



#138 
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TABLE 1 (Cont.) 



exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122834 



CDS COLllA2_cds.5 4899 bp 63 exons #133 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


101174 


101236 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 
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TABLE 1 (Cont.) 



exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



Allele 



GB:AL03122 8__1 
source 
consequence 
consequence 
consequence 
consequence 
consequence 
consequence 
consequence 



128 



122970 



122970 



A>G 



isSNP SNP00027609 






COL,llA2_cds.6 


134 


3' 


GB:AL031228_1.21 


140 


3 ' 


COLllA2_cds.7 


135 


3' 


COLllA2_cds.8 


136 


3' 


COLllA2_cds.l 


129 


3' 


COLllA2_cds.2 


130 


3' 


COLllA2__cds .3 


131 


3' 



183 
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TABLE 1 (Cont.) 

consequence COLllA2_cds . 4 132 
consequence COLllA2_cds . 5 133 
GIF COL1 1 A2 - genomi c - f wd . gi f 



COL9A2 

Full name : collagen, type IX, alpha 2 
Link : FL_3482334_link_cdna 

Subsequence FN: 3482334CB1 1 

CDS FN:3482334CB1.1 2079 bp 

ORF 99 2177 
Allele FN:3482334CB1 142 1087 

source isSNP SNP00032502 

consequence FN: 3482334CB1 . 1 
Allele FN:3482334CB1 142 1113 

source isSNP SNP00107342 

consequence FN: 3482334CB1 . 1 
Allele FN:3482334CB1 142 1301 

source isSNP SNP00107343 

consequence FN: 3482334CB1 . 1 
Allele FN:3482334CB1 142 1345 

source isSNP SNP00107344 

consequence FN : 3482334CB1 . 1 
Allele FN:3482334CB1 142 2211 

source isSNP SNP00067542 

consequence FN : 3 4 8 2 3 3 4CB1 . 1 
Allele FN:3482334CB1 142 2317 

source isSNP SNP00032503 

consequence FN: 3482334CB1 . 1 
GIF COL9A2-cdna-fwd.gif 
Link : FL_1651412_link_cdna 

Subsequence FN: 1651412CB1 1 

CDS FN:1651412CB1.1 2067 bp 



2864 
#143 



143 
1113 

143- 
1301 

143 
1345 

143 
2211 

143 
2317 

143 



2869 
#145 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



#142 



1087 A>G 



Missense 
OG 

Missense 
A>G 

Silent 
OG 

Missense 
A>G 

3' 
A>G 



#144 



1044 1044 A>G 



68 2134 

FN:1651412CB1 144 
source isSNP SNP00032502 

consequence FN: 1651412CB1 . 1 145 
FN:1651412CB1 144 1070 1070 

source isSNP SNP00107342 

consequence FN: 1651412CB1 . 1 145 
FN:1651412CB1 144 1258 1258 

source isSNP SNP00107343 

consequence FN: 1651412CB1 . 1 145 
FN:1651412CB1 144 1302 1302 

source isSNP SNP00107344 

consequence FN: 1651412CB1 . 1 145 
FN:1651412CB1 144 2168 2168 

source isSNP SNP00067542 

consequence FN: 1651412CB1 . 1 145 
FN:1651412CB1 144 2274 2274 

source isSNP SNP00032503 

consequence FN: 1651412CB1 . 1 145 
GIF COL9 A2 -cdna- f wd . gi f 
Link : FL_1651412_link_.genomic lg4 



330-330 



339-339 



401-401 



416-416 



Q>R 



L>V 



G>A 



Missense 
OG 

Missense 
A>G 

Silent 
OG 

Missense 
A>G 

3' 
A>G 



326-326 



335-335 



397-397 



412-412 



R>Q 



L>V 



G>A 



WO 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 



GB:AF019406 1 17606 #146 

GB:AF01940 6_1651412CD1 1115 

GB:AF019406_3482334CD1 1115 

FL„1651412_mrna_build.l 1048 

FL_3482334_mrna_build.l 1017 



xnRNA 



FL_1651412_mma_build. 1 2649 bp 



17091 #147 
17091 #148 
17606 #149 
17606 #150 
32 exons #149 



exon 


1048 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 


exon 


5682 


5717 


exon 


5811 


5834 


exon 


6178 


6231 


exon 


6573 


6626 


exon 


6741 


6788 


exon 


7002 


7058 


exon 


7142 


7195 


exon 


7521 


7574 


exon 


7971 


8024 


exon 


8124 


8177 


exon 


8297 


8350 


exon 


10041 


10094 


exon 


10530 


10583 


exon 


10787 


10840 


exon 


12101 


12145 


exon 


12519 


12572 


exon 


13436 


13489 


exon 


13754 


13807 


exon 


13892 


13963 


exon 


14184 


14219 


exon 


14311 


14355 


exon 


14440 


14472 


exon 


14603 


14749 


exon 


15093 


15147 


exon 


15467 


15655 


exon 


16387 


16464 


exon 


16895 


17606 


GB:AF019406_ 


.3482334CD1 


exon 


1115 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 


exon 


5682 


5717 


exon 


5811 


5834 


exon 


6178 


6231 


exon 


6573 


6626 


exon 


6741 


6800 


exon 


7002 


7058 


exon 


7142 


7195 


exon 


7521 


7574 


exon 


7971 


8024 


exon 


8124 


8177 


exon 


8297 


8350 



2079 bp 



32 exons 



#148 



185 
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TABLE 1 (Cont.) 

exon 10041 10094 

exon 10530 10583 

exon 10787 10840 

exon 12101 12145 

exon 12519 12572 

exon 13436 13489 

exon 13754 13807 

exon 13892 13963 

exon 14184 14219 

exon 14311 14355 

exon 14440 14472 

exon 14603 14749 

exon 15093 15147 

exon 15467 15655 

exon 16387 16464 

exon 16895 17091 

rnRNA FL_34823 34_mrna_build . 1 2692 bp 



exon 


1017 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 


exon 


5682 


5717 


exon 


5811 


5834' 


exon 


6178 


6231 


exon 


6573 


6626 


exon 


6741 


6800 


exon 


7002 


7058 


exon 


7142 


7195 


exon 


7521 


7574 


exon 


7971 


8024 


exon 


8124 


8177 


exon 


8297 


8350 


exon 


10041 


10094 


exon 


10530 


10583 


exon 


10787 


10840 


exon 


12101 


12145 


exon 


12519 


12572 


exon 


13436 


13489 


exon 


13754 


13807 


exon 


13892 


13963 


exon 


14184 


14219 


exon 


14311 


14355 


exon 


14440 


14472 


exon 


14603 


14749 


exon 


15093 


15147 


exon 


15467 


15655 


exon 


16387 


16464 


exon 


16895 


17606 



CDS GB:AF019406_1651412CD1 2067 bp 32 exons #147 



exon 


1115 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 



WO 03/054166 
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TABLE 1 (Cont.) 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Q>R 

Q>R 
Allele 



5682 5717 
5811 5834 
6178 6231 
6573 6626 
6741 6788 
7002 7058 
7142 7195 
7521 7574 
7971 8024 
8124 8177 
8297 8350 
10041 10094 
10530 10583 
10787 10840 
12101 12145 
12519 12572 
13436 13489 
13754 13807 
13892 13963 
14184 14219 
14311 14355 
14440 14472 
14603 14749 
15093 15147 
15467 15655 
16387 16464 
16895 17091 
GB:AF019406 
source 
consequence 



146 10809 10809 A>G 
isSNP SNP00032502 
GB:AF019406_3482334CD1 



148 



consequence GB : AF0 1940 6_1 6514 12CD1 147 



GB:AF019406 

source 

consequence 



146 13783 13783 A>G 
isSNP SNP00107343 
GB:AF019406_3482334CD1 



G 

Allele 



consequence GB : AF019406__1651412CD1 



GB:AF019406 
source 
consequence 
consequence 
GIF COL9 A2 -genomic- f wd 



146 17229 17229 A>G 

isSNP SNP00032503 

GB : AF019406_3482334CD1 

GB:AF019406_1651412CD1 

■ gif 



148 
147 



148 
147 



Missense 
Missense 

Silent 
Silent 



330-330 
326-326 

401-401 
397-397 



COMP 

Full name : cartilage oligomeric matrix protein 
Link : FL_1901242_link_cdna 

Subsequence FN: 1901242CB1 1 2447 #151 

CDS FN:1901242CB1.1 2274 bp #152 

ORF 23 2296 
Allele FN:1901242CB1 151 1200 1200 A>G 

source isSNP SNP00017026 

187 



WO 03/054166 
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TABLE 1 (Cont.) 



consequence FN: 1901242CB1 . 1 152 
Allele FN:1901242CB1 151 1319 1319 

source isSNP SNP00108392 

consequence FN: 1901242CB1 . 1 152 
Allele FN:1901242CB1 151 1335 1335 

source isSNP SNP00017027 

consequence FN: 1901242CB1 . 1 152 
Allele FN:1901242CB1 151 1777 1777 

source isSNP SNP00017029 

consequence FN: 1901242CB1 . 1 152 
GIF COMP-cdna-fwd.gif 
Link : FL_1 90124 2_1 ink_g enomi c 

Subsequence GB:AC003107 1 46275 #153 

Subsequence GB : AC003107_19 01242CD1 

Subsequence FL_19 01242_mrna_build . 1 



Missense 
OG 

Missense 
OG 

Missense 
A>G 

Silent 



393-393 



433-433 



438-438 



585-585 



S>L 



D>H 



G>A 



CDS GB:AC0 


03107_ 


1901242CD1 


2274 bp 


19 


exon 


32077 


31999 






exon 


31743 


31658 






exon 


31421 


31370 






exon 


30922 


30750 






exon 


30105 


29968 






exon 


29721 


29647 






exon 


29558 


29400 






exon 


29322 


29218 






exon 


29127 


29020 






exon 


28458 


28299 






exon 


27459 


27341 






exon 


27100 


27048 






exon 


26955 


26774 






exon 


26660 


26482 






exon 


26355 


26307 






exon 


25901 


25705 






exon 


25172 


25000 






exon 


24002 


23863 






exon 


23770 


23724 






mRNA 


FL_19 


01242_mrna_ 


build. 1 2438 


bp 



32077 23724 #154 
32099 23582 #155 
:ons #154 



19 exons 



#155 



exon 32099 31999 

exon 31743 31658 

exon 31421 31370 

exon 30922 30750 

exon 30105 29968 

exon 29721 29647 

exon 29558 29400 

exon 29322 29218 

exon 29127 29020 

exon 28458 28299 

exon 27459 27341 

exon 27100 27048 

exon 26955 26774 

exon 26660 26482 

exon 26355 26307 

exon 25901 25705 

exon 25172 25000 

exon 24002 23863 

exon 23770 23582 



188 
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Allele 



T 

Allele 



S>L 
Allele 



GB:AC003107 153 25864 25864 A>G 

source isSNP SNP00017029 

consequence GB : AC003107_1901242CD1 154 

GB:AC003107 153 27417 27417 A>G 

source isSNP SNP00017026 

consequence GB : AC003107_1901242CD1 154 

GB:AC003107 153 32082 32082 A>G 

source isSNP SNPO 0017 025 

consequence GB : AC003107 1901242CD1 154 



Silent 



Missense 



585-585 



393-393 



GIF COMP-genomic-rev . gi f 



CRLFl 

Full name : cytokine receptor-like factor 1 
Link : CRLFl_link_cdna 

Subsequence GB : AFO 7 351 5_1 1 

CDS GB:AF073515_1.1 1269 bp 

ORF 204 1472 
Allele GB:AF073515_1 156 984 

source isSNP SNP00015261 

consequence GB : AF073 5 15_1 . 1 
GIF CRLF1 - cdna- fwd . gi f 



1804 
#157 

984 



#156 



A>G 



157 Missense 



261-261 



P>S 



CRP 

Full name : C-reactive protein 
Link : CRP_link_cdna 

Subsequence GB:X56214_1 



1631 



CDS GB:X56214_1.1 



675 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



90 764 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 



158 447 447 
isSNP SNP00100892 
GB:X56214_1.1 
158 988 988 
isSNP SNP00029575 
GB:X56214_1.1 
158 1010 1010 
isSNP SNP00076237 
GB:X56214_1.1 
158 1146 1146 
isSNP SNP00076238 
GB:X56214_1.1 
158 1175 1175 
isSNP SNP00100893 
GB:X56214_1.1 
158 1406 1406 
isSNP SKTP00100894 
GB:X56214_1.1 
158 1525 1525 
isSNP SNP00100895 
GB:X56214_1.1 189 



#158 
#159 

A>G 

159 
A>G 

159 
A>G 

159 
OG 

159 
G>T 

159 
A>G 

159 
A>G 

159 



Missense 



120-120 



S>P 
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TABLE 1 (Cont.) 

GIF CRP-cdna-fwd.gif 
Link : CRP__link_genomic 

Subsequence GB : HUMCRPGA 1 2480 #160 



Allele 


GB : HUMCRPGA 


160 


865 865 


A>G 




source 


isSNP 


SNP00100892 




Allele 


GB : HUMCRPGA 


160 


1404 1404 


A>G 




source 


isSNP 


SNP00029575 




Allele 


GB: HUMCRPGA 


160 


1426 1426 


A>G 




source 


isSNP 


SNP00076237 




Allele 


GB : HUMCRPGA 


160 


1562 1562 


C>G 




source 


isSNP 


SNP00076238 




Allele 


GB : HUMCRPGA 


160 


1591 1591 


G>T 




source 


isSNP 


SNP00100893 




Allele 


GB: HUMCRPGA 


160 


1822 1822 


A>G 




source 


isSNP 


SNP00100894 




Allele 


GB: HUMCRPGA 


160 


1941 1941 


A>G 




source 


isSNP 


SNP00100895 




Allele 


GB : HUMCRPGA 


160 


2045 2045 


A>G 




source 


isSNP 


SNP00100896 




Allele 


GB: HUMCRPGA 


160 


2159 2159 


A>G 




source 


isSNP 


SNP00100897 




Allele 


GB : HUMCRPGA 


160 


2260 2260 


A>G 




source 


isSNP 


SNP00006286 





CRTL1 

Full name : cartilage linking protein 1 
Link : CRTLl„link_cdna 

Subsequence GB:HSU43328 1 1759 



CDS GB:HSU43328.1 



1065 bp 



118 1182 
GB:HSU43328 
source 
consequence 
GB:HSU43328 
source 
consequence 
GIF CRTLl-cdna-fwd.gif 



ORF 
Allele 



Allele 



161 801 801 
isSNP SNP00020236 
GB:HSU43328.1 
161 1454 1454 
isSNP SNP00002295 
GB:HSU43328.1 



#161 
#162 

C>6 

162 
A>G 

162 



Silent 



228-228 



CTSC 

Full name : cathepsin C 
Link : CTSC_link_cdna 

Subsequence GB : NM_ 

CDS GB:NM_001814.1 



001814 
1392 bp 



ORF 
Allele 



Allele 



Allele 



34 1425 
GB:NM__001814 
source isSNP 
consequence GB:NM_ 
GB:NM_001814 
source isSNP 



163 491 
SNP00006579 
001814.1 
163 1206 
SNP00006580 



consequence GB : NM_0 01814.1 
GB:NM_001814 163 1224 

190 



1838 
#164 

491 

164 
1206 

164 
1224 



#163 



A>G 

Missense 
G>T 

Silent 
A>G 



153-153 



391-391 
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TABLE 1 (Cont.) 



source isSNP SNP00105444 

consequence GB :NM_001814 . 1 164 
GIF CTSC-cdna-fwd.gif 
Link : CTSC_link_genomic 

Subsequence CTSC_cds.l 150285 106619 

Subsequence CTSC_cds . 2 150285 106619 

Subsequence GB : AGO 110 8 8_8 1 164991 

Subsequence CTSC_mrna_build . 1 150318 

bp 7 exons #165 

150114 



Silent 



397-397 



#165 
#166 
#167 



106206 



#168 



CDS CTSC_cds.l 1392 
exon 150285 



exon 
exon 
exon 
exon 
exon 
exon 



147695 
125167 
121931 
113258 
108877 
107121 



CDS CTSC_cds .2 12 60 
exon 150285 



exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



147695 
125167 
121931 
113258 
107121 



147550 
125001 
121776 
113143 
108746 
106619 

bp 6 
150114 
147550 
125001 
121776 
113143 

' 106619 



exons 



#166 



CTSC_mrna_build. 1 1838 bp 



7 exons 



#168 



150114 
147550 
125001 
121776 
113143 
108746 
106206 

167 106820 106820 A>G 

SNP00105444 
CTSC_cds.l 165 Silent 397-397 
CTSC_cds.2 166 Silent 353-353 
8 167 106838 106838 G>T 

isSNP SNP00006580 

CTSC_cds.l 165 Silent 391-391 
CTSC„cds.2 166 Silent 347-347 
GB:AC011088_8 167 122438 122438 A>G 

source dbSNP gnl | dbSNP | ssl078568_allele 

dbSNP gnl|dbSNP|ssl088590_allele 
dbSNP gnl|dbSNP|ss382670_allele 
dbSNP gnl j dbSNP j ss403413_allele 



150318 
147695 
125167 
121931 
113258 
108877 
107121 

GB:AC011088_8 
source isSNP 
consequence 
consequence 
GB-.AC011088. 
source 
consequence 
consequence 



F 

F 



T 
T 



source 
source 
source 
consequence 
consequence 



CTSC_cds . 1 
CTSC_cds . 2 



GB:AC011088„8 167 
source wetSNP 
consequence CTSC_cds . 1 
consequence CTSC_cds . 2 
GB:AC011088_8 167 
source 
source 



16 5 Intron 
166 Intron 

124932 124932 A>T 

GB: AC011088_8 . V124932 . A>T 
165 Intron 
16 6 Intron 

125028 125028 A>G 

isSNP SNP00006579 

wetSNP GB:AC011088_8 .vl25028.A>G 



consequence 
consequence 



CTSC_cds . 1 
CTSC cds.2 



165 

166 
191 



Missense 
Missense 



153-153 
153-153 



I>T 
I>T 
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A>G 



165 Intron 
16 6 Intron 

150261 150261 A>G 

GB:AC011088_8.vl50261.G>A 



Allele GB:AC011088_8 167 142996 142996 

source dbSNP gnl | dbSNP | ssl530135_allele 

consequence CTSC_cds . 1 
consequence CTSC_cds . 2 

Allele GB:AC011088_8 167 

source wetSNP 
consequence CTSC_cds . 1 165 
consequence CTSC__cds . 2 166 

Allele GB:AC011088_8 167 150303 150303 

source isSNP SNP00067426 

consequence CTSC_cds . 1 165 5' 
consequence CTSC_cds . 2 166 5' 

GIF CTSC-genomic-rev.gif 



Missense 
Missense 



9-9 
9-9 



L>F 
L>F 
A>G 



CTSL 

Full name : cathepsin L 
Link : CTSL_link_genomic 

Subsequence CTSL_cds . 1 35962 179319 #169 

Subsequence GB : AL160279_2 1 186528 #170 

Subsequence CTSL_mrna_build . 1 34477 179604 #171 

Subsequence CTSL_cds . 2 35962 179319 #172 

mRNA CTSL_mrna_build.l 1577 bp 8 exons #171 

exon 34477 34756 
exon 35952 36087 
exon 36385 36507 
exon 36608 36754 
exon 36943 37167 
exon 37931 38093 
exon 38739 38856 
exon 179220 179604 
CDS CTSL_cds . 1 1002 bp 7 exons #169 

exon 35962 36087 
exon 36385 36507 
exon 36608 36754 
exon 36943 37167 
exon 37931 38093 
exon 38739 38856 
exon 179220 179319 
CDS CTSL_cds.2 777 bp 6 exons #172 

exon 35962 36087 
exon 36385 36507 
exon 36608 36754 
exon 37931 38093 
exon 38739 38856 
exon 179220 179319 
Allele GB:AL160279_2 170 35919 35919 C>6 

source wetSNP GB : AL1 6027 9_2 .v3 5919 . OG 

consequence CTSL_cds . 1 169 5' 
consequence CTSL_cds . 2 172 5' 
Allele GB:AL16 0279_2 170 36118 36118 A>G 

source wetSNP GB : AL16 0279_2 .v36118.C>T 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 

192 
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TABLE 1 (Cont.) 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



36191 36191 G>T 
GB:AL160279_2 .v36191.C>A 



GB:AL160279_2 170 
source wetSNP 
consequence CTSL_cds.l 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL16 0279_2 170 44998 44998 A>G 

source isSNP SNP00043782 

consequence CTSL_cds . 1 16 9 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 170 45748 45748 A>G 

source isSNP SNP00007530 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 170 45833 45833 OG 

source isSNP SNP00100366 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 170 46188 46188 A>G 

source isSNP SNP001003 65 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL__cds . 2 172 Intron 
GB:AL160279_2 170 46599 46599 OG 

source isSNP SNP00061067 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL16027 9_2 170 46662 46662 OG 

source isSNP SNP00100364 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL__cds.2 172 Intron 
GB:AL160279_2 170 65760 65760 A>G 

source isSNP SNP00048929 

consequence CTSL_cds . 1 16 9 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279 2 170 81133 81133 A>G 



source 

source 

source 

consequence 

consequence 

GB:AL16 0279. 

source 

consequence 



dbSNP gnl|dbSNP|ss92017 6_allele 
dbSNP gnl|dbSNP|ssl066694_allele 
dbSNP gnl | dbSNP | ss402532_allele 
CTSL_cds .1 169 Intron 
CTSL_cds .2 172 Intron 
2 170 104937 104937 

isSNP SNP00055641 
CTSL_cds.l 169 Intron 



consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 170 115466 115466 

source isSNP SNP00100363 

consequence CTSL — cds . 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL,160279_2 170 127655 127655 

source dbSNP gnl | dbSNP | ss8107 69_allele 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 170 149731 149731 

source dbSNP gnl | dbSNP | ssl452230_allele 

consequence CTSL_cds - 1 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GIF CTSL-genomic-fwd.gif 

193 



A>G 



A>G 



A>T 



A>G 
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DAF 

Full name : decay accelerating factor for complement 
Link : DAF_link_genomic 



Subsequence 
Subsequence 
Subsequence 
Subsequence 



DAF_cds.l 131174 
DAF_cds.2 131174 
GB:AC031978_3 1 
DAF mrna bu ild.l 131109 



CDS DAF^cds.l 1146 


bp 10 exons 


exon 


131174 


131273 


exon 


131790 


131975 


exon 


133967 


134158 


exon 


135030 


135129 


exon 


136160 


136245 


exon 


140516 


140704 


exon 


146101 


146226 


exon 


146737 


146817 


exon 


148808 


148828 


exon 


168960 


169024 


CDS DAF_cds.2 1125 


bp 9 exons 


exon 


131174 


131273 


exon 


131790 


131975 


exon 


133967 


134158 


exon 


135030 


135129 


exon 


136160 


136245 


exon 


140516 


140704 


exon 


146101 


146226 


exon 


146737 


146817 


exon 


168960 


169024 


mRNA 


DAF_mrna_build . 1 2 0 84 bp 


exon 


131109 


131273 


exon 


131790 


131975 


exon 


133967 


134158 


exon 


135030 


135129 


exon 


136160 


136245 


exon 


140516 


140704 


exon 


146101 


146226 


exon 


146737 


146817 


exon 


148808 


148828 


exon 


168960 


169897 



169024 
169024 
170170 

i 

#173 



#173 
#174 
#175 



169897 



#176 



#174 



10 exons 



#176 



Allele GB:AC031978_3 175 

source wetSNP 
consequence DAF_cds . 1 
consequence DAF_cds . 2 

Allele GB:AC031978_3 175 

source isSNP SNP00072272 

consequence DAF_cds . 1 173 Intron 
consequence DAF_cds . 2 174 Intron 

Allele GB:AC031978_3 175 146611 146611 

source isSNP SNP00072273 

consequence DAF_cds . 1 173 Intron 
consequence DAF_cds . 2 174 Intron 

Allele GB:AC031978_3 175 146659 146659 

source isSNP SNP00030860 

194 



132041 132041 A>G 

GB:AC031978_3 .vl32041.C>T 

173 Intron 

174 Intron 

146352 146352 A>G 



A>G 



A>G 
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consequence DAF_cds . 1 173 Intron 
consequence DAF_cds . 2 174 Intron 

Allele GB:AC031978_3 175 165604 165604 A>G 

source isSNP SNP00102533 

consequence DAF_cds . 1 173 Intron 
consequence DAF_cds . 2 174 Intron 

Allele GB:AC031978„3 175 165743 165743 A>G 

source isSKFP SNP00102534 

consequence DAF_cds . 1 173 Intron 
consequence DAF__cds . 2 174 Intron* 

GIF DAF-genomic-fwd.gif 



E2F6 

Full name : E2F transcription factor 6 

Link : E2F6_link_cdna 

Subsequence GB:AF0413 81 1 2 027 #177 

Allele GB:AF041381 177 1399 1399 A>G 

source isSNP SNP00002319 



EGF 

Full name : EGF 

Link : EGF_link_cdna 

Subsequence GB : HSEGFRER 1 4871 #178 

CDS GB : HSEGFRER . 1 3624 bp #179 

ORF 437 4060 
Allele GB: HSEGFRER 178 4453 4453 A>G 

source isSNP SNP00043 643 

consequence GB : HSEGFRER . 1 179 3' 

GIF EGF- cdna- f wd . gi f 

Link : EGF_link_genomic 

Subsequence GB:AC005509 1 143391 #180 

Subsequence GB:AC004050 270590 143492 #181 

Subsequence EGF_cds . 1 64892 166730 #182 

Subsequence EGF„mrna_build . 1 64456 167538 #183 

CDS EGF_cds.l 3624 bp 24 exons #182 



exon 


64892 65018 




exon 


92502 92701 




exon 


94810 94991 




exon 


95398 95625 




exon 


96629 96831 




exon 


110868 


110993 


exon 


112423 


112545 


exon 


113419 


113541 


exon 


114729 


114854 


exon 


115957 


116093 


exon 


120527 


120675 


exon 


126259 


126363 


exon 


127568 


127791 


exon 


131528 


131695 


exon 


132382 


132531 


exon 


134978 


135097 
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TABLE 1 (Cont.) 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



139300 
143859 
148522 
150008 
154954 
159780 
163427 
166477 



139416 
143984 
148644 
150155 
155121 
159897 
163505 
166730 



EGF_mrna_bui Id . 1 
64456 65018 
92701 
94991 
95625 
96831 



4868 bp 



24 exons 



#183 



92502 
94810 
95398 
96629 
110868 
112423 
113419 
114729 
115957 
120527 
126259 
127568 
131528 
132382 
134978 
139300 
140140 
148522 
150008 
154954 
159780 
163427 
166477 
GB:AC005509 
source 
consequence 
GB:AC005509 
source 
consequence 
GB: AC005509 
source 
consequence 
GB:AC005509 
source 
consequence 
GB:AC005509 
source 
consequence 
GB:AC005509 
source 
cons equenc e 
GB:AC005509 
source 
consequence 
GB:AC005509 



110993 
112545 
113541 
114854 
116093 
120675 
126363 
127791 
131695 
132531 
135097 
139416 
140265 
148644 
150155 
155121 
159897 
163505 
167538 

180 70903 70903 A>G 

dbSNP gnl|dbSNP|ss875266_allele 

EGF_cds .1 182 Intron 

180 92638 92638 A>G 



wetSNP 
EGF_cds . 1 
180 92670 
wetSNP 
EGF_cds . 1 
180 92763 
wetSNP 
EGF_cds . 1 
180 94933 
wetSNP 
EGF_cds . 1 
180 95444 
wetSNP 
EGF__cds . 1 
180 96578 
wetSNP 
EGF_cds . 1 
180 96660 



GB:AC005509 .v92638 
182 Silent 
92670 A>G 

GB:AC005509 . v92670 . A>G 
182 Missense 
92763 A>G 

GB:AC005509 .v92763 ,OT 
182 Intron 
94933 A>G 
GB:AC005509 .v94933 
182 Missense 
95444 OG 

GB:AC005509 .v95444.G>C 

182 Missense 

96578 G>T 

GB:AC005509 .v96578 

182 Intron 

96660 OG 
196 



OT 
88-88 



99-99 Q>R 



OT 

151-151 



186-186 



.A>C 



H>Y 



D>H 
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TABLE 1 (Cont.) 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB: AC005509 

source 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB: AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB: AC005509 

source 

consequence 

GB: AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC004050 

source 

consequence 

GB:AC004050 

source 

consequence 

GB:AC004050 



wetSNP 
EGF_cds . 1 
180 96842 
wetSNP 
EGF_cds . 1 
180 96853 
wetSNP 
EGF_cds . 1 
180 100795 



GB: AC005509 .v96660 .G>C 
182 Missense 257-257 
9 6842 A>G 

GB:AC005509 . v9 6 842 . G>A 
182 Intron 
96853 A>G 

GB:AC005509 . v96 853 . G>A 
182 Intron 

100795 G>T 



D>H 



dbSNP gnl | dbSNP | ss48546__allele 
dbSNP gnl|dbSNF|ss569965_allele 
EGF__cds.l 182 Intron 
180 112451 112451 A>G 

wetSNP GB:AC005509 .vll2451.T>C 

EGF_cds.l 182 Silent 365-365 
180 113396 113396 A>G 

wetSNP GB:AC005509 . vll339 6 . T>C 

EGF_cds .1 182 Intron 
180 113521 113521 A>G 

wetSNP GB: AC005509 .vll3521 . G>A 

EGF_cds.l 182 Missense 431-431 
180 114696 114696 A>G 

wetSNP GB:AC005509 .vll4696 . OT 

EGF_cds.l 182 Intron 
180 126323 126323 A>G 

wetSNP GB:AC005509 .vl26323 . A>G 

EGF_cds.l 182 Missense 597-597 
180 127715 127715 A>G 

wetSNP GB:AC005509 .V127715 .OT 

EGF_cds . 1 182 Silent 659-659 
180 131547 131547 A>G 

wetSNP GB:AC005509 .vl31547 . A>G 

EGF_cds . 1 182 Silent 691-691 
180 131598 131598 A>G 

wetSNP GB: AC005509 .vl31598 .G>A 

EGF_cds . 1 182 Missense 708-708 
180 131641 131641 OG 

wetSNP GB:AC005509 . vl31641 .G>C 

EGF_cds.l 182 Missense 723-723 
180 132511 132511 A>T 

wetSNP GB:AC005509 . V132511 . A>T 

EGF_cds.l 182 Missense 784-7 84 

180 139281 139281 A>G 

wetSNP GB: AC005509 . vl39281 . G>A 

EGF__cds.l 182 Intron 

180 139333 139333 A>G 
wetSNP GB:AC005509 .vl39333 .T>C 
EGF__cds.l 182 Missense 842-842 

181 126737 126737 G>T 
wetSNP GB:AC004050 .vl26737 .C>A 
EGF_cds.l 182 Intron 

181 122948 122948 A>G 

isSNP SNP00118827 
EGF_cds . 1 182 Intron 
181 122045 122045 A>T 

197 



H 



R>K 



I>V 



M>I 



G>R 



D>V 



M>T 
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TABLE 1 (Cont.) 



source 
consequence 

Allele GB:AC0040 50 

source 
consequence 

Allele GB:AC004050 
source 
consequence 

Allele GB:AC004050 
source 
consequence 

Allele GB:AC004050 
source 
consequence 

GIF EGF - genomic- fwd.gi 



GB: AC004050 .V122045 . A>T 



920-920 
G>T 



A>G 



wetSNP 

EGF_cds.l 182 Missense 

181 110980 110980 
isSNP SNP00101773 

EGF_cds.l 182 Intron 

181 110796 110796 
wetSNP GB:AC004050 .vll0796 . A>G 

EGF_cds.l 182 Silent 1063-1063 

181 104082 104083 GOGCC 
wetSNP GB: AC004050 . vl04082 .GOGCC 

EGF_cds.l 182 Frameshift 1134-1135 

181 103468 103468 A>G 
isSNP SNP00043643 

EGF_cds.l 182 3' 
f 



E>V 



FDFTl 

Full name : f arnesyl -diphosphate farnesyl trans f erase 1 

Link : FDFTl_link__cdna 

Subsequence GB: FDFTl 1 1649 #184 

CDS GB : FDFTl . 1 1254 bp #185 



45 1298 
GB : FDFTl 
source 
consequence 
GB : FDFTl 
source 
consequence 
GB: FDFTl 
source 
consequence 
GB : FDFTl 
source 
consequence 
GB : FDFTl 
source 
consequence 
GB : FDFTl 
source 
consequence 
GB : FDFTl 
source 
consequence 
GIF FDFTl-cdna-fwd.gif 
Link : FDFTl_link_genomic 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



184 



65 



65 



isSNP SNP00072434 
GB : FDFTl. 1 185 
184 178 178 
isSNP SNP00065489 
GB: FDFTl. 1 185 
184 245 245 
isSNP SNP00018570 
GB: FDFTl. 1 185 
184 590 590 
isSNP SNP00123116 
GB : FDFTl. 1 185 
184 1016 1016 
isSNP SNP00003188 
GB: FDFTl. 1 185 
184 1220 1220 
isSNP SNP00123117 
GB: FDFTl. 1 185 
184 1532 1532 
isSNP SNP00003189 
GB : FDFTl. 1 185 



A>G 

Silent 
A>G 

Missense 
A>G 

Silent 
A>G 

Silent 
OG 

Silent 
A>G 

Silent 
A>G 



Subsequence FDFTl_cds . 1 5681 37973 

Subsequence GB : AC025857_2_000033 

Subsequence GB : AC02 5857_2_Q 00021 

Subsequence GB : AC025857_2_000014 

Subsequence GB : AC025857_2_000029 

Subsequence FDFTl_mrna_build . 1 
mRNA FDFTl_mrna_build . 1 

exon 5639 5779 



7-7 



45-45 K>R 



67-67 N 



182-182 



324-324 



392-392 



#186 
1 

19521 
29099 
29200 
5639 



19420 
25487 
25588 
40859 
38324 



#187 
#188 
#189 
#190 
#191 



1647 bp 
198 



8 



exons 



#191 
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exon 
exon 
exon 
exon 
exon 
exon 
exon 

CDS FDFT1 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

Allele 



11642 11739 

12515 12698 

24238 24366 

26209 26400 

29608 29784 

30882 31034 

37752 38324 

_cds.l 1254 bp 

5681 5779 

11642 11739 

12515 12698 

24238 24366 

26209 26400 

29608 29784 

30882 31034 



8 exons 



#186 



37752 37973 

GB:AC025857_2_000033 187 
source isSNP SNP00072434 

consequence FDFTl_cds . 1 186 
Allele GB:AC025857_2_000033. 187 

source isSNP SNP00072231 

consequence FDFTl__cds . 1 186 
Allele GB:AC025857_2_000033 187 

source isSNP SNP00065489 

consequence FDFTl__cds . 1 186 
Allele GB:AC025857_2_000014 189 

source isSNP SNP00123116 

consequence FDFTl_cds . 1 186 
Allele GB:AC025857_2_000029 190 

source isSNP SNP00003188 

consequence FDFTl_cds .1 186 
Allele GB:AC025857_2_000029 190 

source isSNP SNP0009 602 6 

consequence FDFTl_cds .1 186 
Allele GB:AC025857_2_000029 190 

source isSNP SNP00105147 

consequence FDFTl_cds . 1 186 
Allele GB:AC025857_2_000029 190 

source isSNP SNP00123117 

consequence FDFTl_cds . 1 186 
Allele GB:AC025857_2_000029 190 

source isSNP SNP00003189 

consequence FDFTl_cds . 1 186 
Allele GB:AC025857_2_000029 190 

source isSNP SNP00003190 

consequence FDFTl_cds . 1 186 
GIF FDFTl-genoinic-fwd.gif 



5701 5701 A>G 



Silent 
6103 6103 

Intron 
11676 11676 

Missense 
2856 2856 

Silent 
1775 1775 

Silent 
5704 5704 

Intron 
8528 8528 

Intron 
8696 8696 

Silent 
9008 9008 



9148 9148 
3' 



7-7 
OG 



A>G 

45-45 K>R 
A>G 

182-182 
OG 

324-324 
A>G 



A>G 



A>G 

392-392 
A>G 



G>T 



G 



FGF1 

Full name : Fibroblast growth factor 1 (acidic) 

Link : FGFl_link__cdna 

Subsequence GB:X51943_1 1 2259 #192 

CDS GB:X51943_JL.l 468 bp #193 



WO 03/054166 
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TABLE 1 (Cont.) 



ORF 
Allele 



Allele 



Allele 



Allele 



35 502 

GB:X51943_1 

source 

consequence 

GB:X51943_1 

source 

consequence 

GB:X51943_1 

source 

consequence 

GB:X51943_1 

source 



192 590 590 
isSNP SNP00075582 
GB:X51943_1.1 
192 785 785 
isSNP SNP00075583 
GB:X51943_1.1 
192 1855 1855 
isSNP SNP00069845 
GB:X51943_1.1 
192 2007 2007 
isSNP SNP00075584 
GB:X51943_1.1 



consequence 
GIF FGFl-cdna-fwd.gif 
Link : FL_2535357_J_ink_genomic 

Subsequence GB:AC005370 1 76416 

Subsequence GB : AC00537 0_32847 82CD1 

Subsequence FL_3284782_mrna_build . 1 



A>G 

193 
G>T 

193 
A>G 

193 
OG 

193 



#194 

45026 

44979 



itiRNA FL„3284782_mrna„build.l 920 bp 

exon 44979 45194 

exon 58348 58451 

exon 63669 64259 

exon 67347 67355 
CDS GB:AC005370_3284782CD1 465 bp 3 exons 

exon 45026 45194 



63860 #195 
67355 #196 
4 exons #19 6 



#195 



exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



58348 58451 

63669 63860 

GB:AC005370 

source 

consequence 

GB:AC005370 

source 

consequence 

GB:AC005370 

source 

consequence 

GB:AC005370 

source 

consequence 

GB:AC005370 

source 

consequence 

GB:AC005370 

source 

consequence 



194 63951 63951 A>G 
isSNP SNP00075582 
GB:AC005370_32847 82CD1 195 
194 64146 64146 G>T 
isSNP SNP00075583 
GB:AC005370_3284782CD1 195 
194 65119 65119 G>T 
isSNP SNP00012384 
GB:AC005370_32847 82CD1 195 
194 65217 65217 A>G 
isSNP SNP00069845 
GB:AC005370_3284782CDl 195 
194 65369 65369 OG 
isSNP SNP00075584 
GB:AC005370_3284782CD1 195 
194 66005 66005 A>G 
isSNP SNP00045433 
GB:AC005370_3284782CDl • 195 



GIF FGFl-genomic-fwd.gif 



FGF2 

Full name : fibroblast growth factor 2 (basic) 

Link : FGF2_link_cdna 

Subsequence GB : FGF2 1 6757 #197 

CDS GB : FGF2 . 1 633 bp #19 8 

ORF 302 934 

200 



WO 03/054166 
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TABLE 1 (Cont.) 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GIF FGF2 



GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
GB : FGF2 
source 
consequence 
-cdna-fwd. gif 



197 1651 1651 G>T 

isSNP SNP00023270 

GB : FGF2 . 1 198 3' 

197 1691 1691 A>G 

isSNP SNP00058183 

GB : FGF2 . 1 198 3' 

197 4603 4603 A>G 

isSNP SNP00036340 

GB : FGF2 . 1 198 3' 

197 4909 4909 A>G 

isSNP SNP00036341 

GB : FGF2 . 1 19 8 3' 

197 5455 5455 A>G 

isSNP SNP00123025 

GB : FGF2 . 1 198 3' 

197 5466 5466 OG 

isSNP SNP00036342 

GB : FGF2 . 1 198 3' 

197 5892 5892 G>T 

isSNP SNP00062439 

GB : FGF2 . 1 19 8 3' 

197 5937 5937 A>G 

isSNP SNP00062440 

GB : FGF2 . 1 198 3' 



FGFR1 

Full name : Fibroblast growth factor receptor- 1 
Link : FGFRl_link__ cdna 

Subsequence GB:M34185_1 1 3365 



CDS GB:M34185_1.1 



2202 bp 



ORF 
Allele 



Allele 



256 2457 
GB:M34185_1 
source 
consequence 
GB:M34185_1 
source 
consequence 
GIF FGFRl-cdna-fwd.gif 



#199 
#200 



199 1471 1471 A>G 
isSNP SNP00107960 

GB:M34185_1.1 200 

199 3224 3224 G>T 
isSNP SNP00107961 

GB:M34185_1.1 200 



Missense 



406-406 



A>T 



FMOD 

Full name : fibromodulin 

Link : FMOD_link_cdna 

Subsequence GB : FMOD 1 2863 #201 

CDS GB : FMOD . 1 1131 bp #2 02 

ORF 21 1151 

Allele GB : FMOD 201 2653 2653 OG 

source isSNP SNP00001499 

consequence GB : FMOD . 1 202 3' 

Allele GB : FMOD 201 2739 2739 A>G 

source isSNP SNP00001500 

201 



WO 03/054166 
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consequence GB : FMOD . 1 202 3' 
GIF FMOD-cdna-fwd.gif 



FRZB 

Full name : f rizzled-related protein 
Link : FRZB_link_cdna 

Subsequence GB:U919 03_1 1 



CDS GB:U91903_1.1 



70 1047 
GB:U91903_1 
source 
consequence 
GB:U91903_1 
source 
consequence 
GB:U919 03_1 
source 
consequence 
GB:U91903_1 
source 
consequence 
GIF FRZB-cdna-fwd.gif 



ORF 
Allele 



Allele 



Allele 



Allele 



978 bp 



667 



1909 



667 



203 

isSNP SNP00016790 
GB:U919 03_1.1 
203 1039 1039 
isSNP SNP00001065 
GB:U91903__1.1 
203 1259 1259 
isSNP SNP00001066 
GB:U919 03_1.1 
203 1305 1305 
isSNP SNP00016791 
GB:U91903_1.1 



#203 
#204 

A>G 

204 
OQ 

204 
A>G 

204 
A>G 

204 



Missense 



Missense 



200-200 



324-324 



R>W 



R>G 



FST 

Full name : Follistatin 
Link : FST_link__cdna 

Subsequence GB : FST 

CDS GB:FST.l 954 bp 



954 
#206 



ORF 
Allele 



Allele 



1 954 
GB : FST 
source 
consequence 
GB : FST 
source 
consequence 
GIF FST-cdna-fwd.gif 
Link : FST__link_genomic 
Subsequence 
Subsequence 
Subsequence 
CDS FST_cds . 1 

exon 77877 



205 454 454 
isSNP SNP00015508 
GB:FST.l 206 
205 853 853 
isSNP SNP00052278 
GB:FST.l 206 



FST__cds.l 77877 73442 
GB:AC008901_2 1 
FST_mrna build. 1 77877 



#205 



A>G 



iriRNA 



exon 
exon 
exon 
exon 

exon 
exon 
exon 
exon 



951 bp 5 exons 

77793 
75597 
74946 
74375 
73442 

FST_mrna_bui Id . 1 953 bp 
77877 77793 
75597 
74946 
74375 



Missense 
OG 

Missense 



#207 
192639 
73440 #209 
#207 



152-152 



285-285 



#208 



E>K 



A>P 



75788 
75164 
74599 
73671 



75788 
75164 
74599 



exons 



#209 



202 
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73671 73440 
GB:AC008901_2 208 
source wetSNP 
consequence FST__cds . 1 
GB:AC008901_2 208 
source isSNP SNP0 

consequence FST_cds . 1 
GB:AC008901„2 208 
source isSNP SNP0 

consequence FST_cds . 1 
GB:AC008901_2 208 
source dbSNP gnl | 

consequence FST__cds . 1 
GB:AC008901__2 208 
source dbSNP gnl | 

source dbSNP gnl | 

consequence FST__cds . 1 
GB:AC008901_2 208 
source dbSNP gnl | 

consequence FST_cds . 1 
GIF FST-genomic-rev.gif 



exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



73454 73454 A>G 

GB: AC008901_2 .v73454.G>A 



207 

73540 
0052278 

207 

74988 
0015508 

207 

76361 



Silent 
73540 OG 

Missense 
74988 A>G 



313-313 



285-285 



Missense 152-152 
76361 OG 
dbSNP| ss42460_allele 
207 Intron 
76373 76373 A>G 
dbSNP | ssl048607_allele 
dbSNP) ss226044_allele 
207 Intron 
76384 76384 A>G 
dbSNP| ss839 844__allele 
207 Intron 



A>P 



E>K 



G0S2 

Full name : putative lymphocyte G0\/G1 switch gene 
Link : FL„3732868_link_genomic 

Subsequence GB:HS28Ol0 1 97700 #210 

Subsequence GB :HS28O10_3732868CDl 52369 52680 #211 

Subsequence FL_3732 868_mrna_Jbuild. 1 52008 53073 #212 

mRNA FL_3732 868__mrna_build. 1 963 bp 2 exons #212 

exon 52008 52233 
exon 52337 53073 
CDS GB:HS28Ol0_3732868CDl 312 bp 1 exon #211 

exon 52369 52680 
Allele GB:HS28Ol0 210 52341 52341 A>G 

source isSNP SNP00039143 

source wetSNP GB:HS28O10 .v52341.T>C 

consequence GB :HS28O10_3732868CDl 211 5' 
GIF G0S2-genomic-fwd.gif 



GADD34 

Full name : growth arrest and DNA- damage- inducible 34 
Link : GADD34_link_cdna 

Subsequence GB:HSU83 9 81 1 2331 



CDS GB:HSU839 81.1 



ORF 
Allele 



Allele 



223 2247 

GB:HSU83 981 

source 

consequence 

GB:HSU83981 

source 

consequence 



2025 bp 



205 



205 



213 

isSNP SNP00116263 
GB:HSU83981.1 
213 314 314 
isSNP SNP00116264 
GB:HSU83 9 81.1 

203 



#213 
#214 

A>G 

214 
A>G 

214 



Missense 



31-31 R>H 
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Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:HSU83 9 81 

source 

consequence 

GB:HSU83981 

source 

consequence 

GB:HSU839 81 

source 

consequence 

GB:HSU83981 

source 

consequence 

GB:HSU83981 

source 

consequence 

GB:HSU83981 

source 

consequence 

GB:HSU83981 

source 

consequence 

GB:HSU83981 

source 

consequence 

GB:HSU83981 

source 



consequence 
GIF GADD3 4 -cdna- f wd . g i f 
Link : GADD34_link_genomic 

Subsequence GADD34__cds . 1 

Subsequence GB : AC026803_2 

Sub s equ enc e GADD3 4_mrna_bu i 1 d 

irtRNA GADD3 4„mrna_bu i 1 d . 1 

exon 220595 

exon 2213 81 

exon 223770 
CDS GADD3 4_cds . 1 

exon 221390 

exon 
Allele 



213 316 316 A>G 
isSNP SNP00029694 

GB:HSU83981.1 214 Missense 32-32 A>T 

213 974 974 C>G 
isSNP SNP00006368 

GB:HSU83981.1 214 Missense 251-251 

213 1051 1051 A>G 
isSNP SNP00006369 

GB:HSU83981.1 214 Missense 277-277 

213 1156 1156 A>G 
isSNP SNP00006370 

GB:HSU83981.1 214 Missense 312-312 

213 1605 1605 A>G 
isSNP SNP00069978 

GB:HSU83981.1 214 Silent 461-461 

213 1650 1650 G>T 
isSNP SNP00069979 

GB:HSU83981.1 214 Missense 476-476 

213 2011 2011 A>G 
isSNP SNP00006372 

GB:HSU83981.1 214 Missense 597-597 

213 2184 2184 A>G 
isSNP SNP00006373 

GB:HSU83981.1 214 Silent 654-654 

213 2199 2199 OG 
isSNP SNP00006374 

GB:HSU83981.1 214 Silent 659-659 



R>P 



K>E 



G>S 



R>S 



T>A 



221390 224129 
1 247509 #216 

1 220595 224213 



#215 



#217 



2331 bp 



Allele 



Allele 



Allele 



Allele 



223770 
GB: AC026803. 
source 
consequence 
GB:AC026803. 
source 
consequence 
GB:AC026803, 
source 
consequence 



220807 
223054 
224213 

202 5 bp 2 exons 

223054 
224129 

2 216 221481 

isSNP SNP00116264 
GADD34_cds.l 215 
2 216 221483 

isSNP SNP00029694 
GADD34_cds.l 215 



3 exons 



#215 



221481 

Missense 
221483 



#217 



A>G 

31-31 
A>G 



R>H 



,2 216 
wetSNP 
GADD34_cds 
GB:AC026803_2 216 
source wetSNP 
consequence GADD34_cds 
GB:AC026 803__2 216 
source 
source 



Missense 32-32 A>T 

221941 221941 A>G 

' GB:AC026803_2 .v221941.G>A 
1 215 Silent 184-184 

221985 221985 A>G 

GB:AC02 6 803_2 .v221985.T>C 
1 215 Missense 199-199 

222141 222141 OG 

isSNP SNP00006368 

wetSNP GB:AC026803_2 . v222141 . G>C 



consequence GADD3 4_cds . 1 



215 



Missense 



251-251 



V>A 



R>P 



204 
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Allele GB:AC02 6803_2 

source isSNP 
consequence GADD34. 

Allele GB:AC026803_2 

source isSNP 
consequence GADD34. 

Allele GB:AC026803_2 

source isSNP 
consequence . GADD34. 

Allele GB:AC02 6 803_2 

source isSNP 
consequence GADD34. 

Allele GB:AC02 6 803_2 

source isSNP 
consequence GADD3 4. 

Allele GB:AC026803_2 

source isSNP 
consequence GADD34. 

Allele GB:AC026803_2 

source isSNP 
consequence GADD34, 

GIF GADD3 4-g enomi c - f wd . g i f 



216 222218 
SNP00006369 
_cds. 1 215 
216 222323 
SNP00006370 
_cds.l 215 
216 222772 
SNP00069978 
_cds . 1 215 
216 222817 
SNP00069979 
_cds.l 215 
216 223893 
SNP00006372 
_cds . 1 215 
216 224066 
SNP00006373 
_cds.l 215 
216 224081 
SNP00006374 
_cds.l 215 



222218 

Missense 
222323 

Missense 
222772 

Silent 
222817 

Missense 
223893 

Missense 
224066 

Silent 
224081 

Silent 



A>G 

277-277 
A>G 

312-312 
A>G 

461-461 
G>T 

476-476 
A>G 

597-597 
A>G 

654-654 
OG 

659-659 



K>E 



G>S 



R>S 



T>A 



GLI 

Full name : glioma-associated oncogene homolog 

Link : GLI_link_cdna 

Subsequence GB : NM_0 0526 9__1 1 3600 

CDS GB:NM_005269_1 .1 3321 bp #219 



79 3399 

GB:NM_005269_1 218 
source isSNP SNP00018615 

consequence GB : NM_0 052 6 9_1 . 1 
GB:NM_005269_1 218 2202 

source isSNP SNP00072776 

consequence GB : NM_0 052 6 9__1 . 1 
GB:1SIM_005269_1 218 2876 

source isSNP SNP00112595 

cons equence GB : WM_0 052 6 9_1 . 1 
GB :NM„0052 69„1 218 3243 

source isSNP SNP00018616 

consequence GB : 1SJM„0 052 6 9_1 . 1 
GB:NM_005269_1 218 3376 

source isSNP SNP00018617 

consequence GB : NM_0 0526 9„1 . 1 
GIF GLI-cdna-fwd.gif 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



219 
2202 

219 
2876 

219 
3243 

219 
3376 

219 



#218 



2179 2179 A>G 



Missense 
A>G 

Silent 
A>G 

Missense 
C>G 

Missense 
OG 

Missense 



701-701 



708-708 



933-933 



1055-1055 



1100-1100 



R>G 



E 



G>D 



E>D 



E>Q 



GLI3 

Full name : GLI-Kruppel family member GLI3 

Link : GLI3_link_cdna 

Subsequence GB : NM_000168_1 1 5046 #220 

CDS GB:NM„000168_1 .1 4791 bp #221 
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ORF 55 4845 

Allele GB:NM_000168_1 220 4502 4502 A>G 

source isSNP SNP00031650 

consequence GB : NM_0 0016 8_1 . 1 221 Missense 1483-1483 G>D 

Allele GB:NM_000168_1 220 4663 4663 A>G 

source isSNP SNP00073523 

consequence GB :NM_000168_1 . 1 221 Missense 1537-1537 R>C 

GIF GLI3-cdna-fwd.gif 



HASl 

Full name : hyaluronan synthase 1 

Link : HASl_link_cdna 

Subsequence GB :NM_001523 

CDS GB:NM_001523 .1 1737 bp 



2088 
#223 



#222 



ORF 
Allele 



Allele 



36 1772 

GB:NM_001523 222 75 75 A>G 

source isSNP SNP0009 6015 

consequence GB : NM_0 01523 . 1 223 Missense 

GB:NM_001523 222 1889 1889 G>T 

source isSNP SNP00064738 

consequence GB:NM_001523 .1 223 3 ' 

GIF HASl-cdna-fwd.gif 
Link- : HAS 1_1 ink__g enomi c 

Subsequence HASl_cds.l 153154 142648 #224 

Subsequence GB : AC018755_2 1 231222 #225 

Subsequence HASl_rnrna_Jbuild. 1 153189 142333 

CDS HASl_cds.l 1737 bp 5 exons #224 

exon 153154 153146 
149119 148427 
146414 146189 
145609 145477 
143323 142648 

HASl_mrna_build. 1 2087 bp 5 exons #226 

153189 153146 
149119 148427 
146414 146189 
145609 145477 
143323 142333 

GB:AC018755_2 225 142531 142531 

source isSNP SNP00064738 

consequence HASl_cds . 1 224 3' 
GB:AC018755_2 225 147775 147775 

source dbSNP gnl | dbSNP | ss71593 0_allele 

consequence HASl_cds . 1 224 Intron 
GB:AC018755_2 225 149089 149089 

source isSNP SNP00096015 

consequence HASl_cds.l 224 Missense 14-14 
GB:AC018755_2 225 149293 149293 

source dbSNP gnl | dbSNP | ss713606_allele 

consequence HASl_cds . 1 224 Intron 
GIF HASl-genomic-rev.gif 



14-14 R>C 



#226 



exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



G>T 



G>T 



A>G 

OR 
OG 
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HAS 2 

Full name : hyaluronan synthase 2 

Link : HAS2__link_cdna 

Subsequence GB : NM__0 05328 

CDS GB:NM_005328 .1 1659 bp 



536 2194 

GB:NM_005328 227 381 

source isSNP SNP00072998 

consequence GB : NM__0 0532 8.1 
GB:NM_005328 227 1357 

source isSNP SNP00104961 

consequence GB : NM__0 05328 . 1 
GIF HAS2-cdna-fwd.gif 



ORF 
Allele 



Allele 



3003 
#228 

381 

228 
1357 

228 



#227 

A>G 

5' 
G>T 

Missense 



274-274 



F>L 



HSPG2 

Full name : heparan sulfate proteoglycan 2 

Link : HSPG2_link_cdna 

Subsequence GB : NM_0 0552 9__2 1 

CDS GB:NM_005529_2 . 1 13182 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



13793 
#230 



#229 



41 13222 

GB:NM_005529_2 229 2155 

source isSNP SNP00054627 

consequence GB :NM_005529_2 .1 
GB:NM_005529_2 229 2340 

source isSNP SNP00054628 

consequence GB : NM_0 0552 9_2 . 1 
GB:NM_005529_2 229 3603 

source isSNP SNP00109135 

consequence GB : NM_0 0552 9_2 . 1 
GB:NM_005529_2 229 3734 

source isSNP SNP00109136 

consequence GB : NM_0 0 5 52 9_2 . 1 
GB:MM_005529_2 229 3943 

source isSNP SNP00054629 

consequence GB : NM_005529_2 . 1 
GB:NM_005529_2 229 4032 

source isSNP SNP00054630 

consequence GB : NM_0 0552 9_2 . 1 
GB:NM_005529_2 229 4554 

source isSNP SNP00109138 

consequence GB :NM„00552 9„2 . 1 
GB:NM_005529__2 229 7042 

source isSNP SNP00048871 

consequence GB :NM_005529_2 . 1 
GB:NM_005529_2 229 7503 

source isSNP SNP00109139 

consequence GB : NM_0 0552 9_2 . 1 
GB:NM_005529_2 229 9548 

source isSNP SNP00109140 

consequence GB : NM_0 0552 9_2 . 1 
GB:NM___005529_2 229 10294 

source isSNP SNP00109141 

consequence GB : NM_0 0552 9_2 . 1 



2155 A>G 



230 
2340 

230' 
3603 

230 
3734 

230 
3943 

230 
4032 

230 
4554 

230 
7042 

230 
7503 

230 
9548 

230 
10294 

230 



Silent 
A>G 

Missense 
A>G 

Missense 
A>G 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 

Missense 
A>G 

Silent 



705-705 



767-767 



1188-1188 



1232-1232 



1301-1301 



1331-1331 



1505-1505 



2334-2334 



2488-2488 



3170-3170 



3418-3418 



S>N 



R>Q 



G>S 



G>D 



V>A 



N 



S>L 



T>A 
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Allele GB:NM_005529. 
source 
consequence 

Allele GB:NM_005529. 
source 
consequence 

Allele GB:NM_005529, 
source 
consequence 

Allele GB:1X!M_005529 
source 
consequence 

Allele GB:NM_005529 
source 
consequence 

GIF HSPG2-cdna-fwd.gif 



_2 229 10663 

isSNP SNP00109142 
GB :NM_005529„2 .1 
_2 229 10941 

isSNP SNP00109143 
GB:NM_005529_2 .1 
_2 229 11233 

isSNP SNP00009830 
GB:NM_JD05529_2 .1 
_2 229 12358 

isSNP SNP00009831 
GB :NMJD05529_2 . 1 
_2 229 12604 

isSNP SNP00038416 
GB:NM_005529_2 .1 



10663 A>G 

230 Silent 

10941 A>G 

230 Missense 

11233 G>T 

230 Silent 

12358 A>G 

230 Silent 

12604 A>G 



230 



Silent 



3541-3541 



3634-3634 



3731-3731 



4106-4106 



4188-4188 



V 



Q>R 



V 



IBSP 

Full name : IBSP 
Link : IBSP__link_cdna 

Subsequence GB : HUMSIALO 



1037 



CDS GB: HUMSIALO. 1 



954 bp 



ORF 
Allele 



Allele 



Allele 



72 1025 
GB: HUMSIALO 
source 
consequence 
GB: HUMSIALO 
source 
consequence 
GB: HUMSIALO 
source 
consequence 
GIF IBSP-cdna-fwd.gif 
Link : IBSP_link_genomic 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS IBSP_cds.l 
exon 2863 



231 494 494 
isSNP SNP00065793 
GB:HUMSIAL0.1 
231 655 655 
isSNP SNP00065794 
GB:HUMSIAL0.1 
231 709 709 
isSNP SNP00018906 
GB:HUMSIALO,l 



GB:HUMBNSP01 1 
GB:HUMBNSP02 2516 
GB:HUMBNSP03 3460 
GB:HUMBNSP04 5195 
IBSP__cds.l 2863 7195 
954 bp 6 exons 

2916 
3009 3059 
3158 3235 
3571 3633 
5882 6040 
6647 7195 

GB : HUMBNSP04 236 1631 

source isSNP SNP00065794 

consequence IBSP_cds . 1 237 
GB:HUMBNSP04 23 6 1685 

source isSNP SNP00018906 

consequence IBSP_cds . 1 237 
GIF IBSP-genomic-fwd.gif 

208 



exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



#231 
#232 

A>G 

232 
A>G 

232 
A>G 

232 



2415 
3359 
5094 
9497 
#237 
#237 



Silent 



Missense 



Missense 



#233 
#234 
#235 
#236 



141-141 



195-195 



213-213 



N 



G>E 



G>D 



1631 A>G 

Missense 
1685 A>G 

Missense 



195-195 



213-213 



E>G 



G>D 
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IER3 

Full name : immediate early response 3 
Link : IER3_link_cdna 

Subsequence GB:Y14551_1 1 1230 

CDS GB:Y14551_1.1 471 bp 

ORF 12 482 
Allele GB:Y14551_1 238 838 838 

source isSNP SNP00052893 

consequence GB : Y14551_l . 1 
GIF IER3-cdna-fwd.gif 
Link : FL_758754_link_genomic 

Subsequence GB:AC006165 1 44118 #240 

Subsequence GB : AC006165_2619577CD1 

Subsequence FL_2619 577_mrna_build. 1 



#238 
#239 

A>G 

239 



14601 15183 #241 
14585 15920 #242 
mRNA FL_2619577_mrna_build.l 1224 bp 2 exons 

exon 14585 14810 
exon 14923 15920 
CDS GB:AC006165_2619577CD1 471 bp 2 exons #241 

exon 14601 14810 
exon 14923 15183 
Allele GB:AC006165 240 15539 15539 A>G 

source isSNP SNP00052893 

consequence GB : AC0 0 616 5_2 619577 CD1 241 3 ' 
GIF IER3-genomic-fwd.gif 



#242 



IHH 

Full name : IHH 
Link : IHH_1 ink__cdna 

Subsequence GB : HUMIHH 1 1277 #243 

CDS GB: HUMIHH. 2 939 bp #244 

ORF 2 940 
Allele GB: HUMIHH 243 457 457 A>G 

source isSNP SNP00097225 

consequence GB: HUMIHH. 2 244 Silent 
GIF IHH-cdna-fwd . gif 
Link : IHH_link_genomic 



152-152 



Subsequence 


IHH_cds . 1 


1 


1469 


#245 




Subsequence 


GB:AB010092_ 


_1 


1 


315 


#246 


Subsequence 


GB:AB018075. 


_1 


416 


698 


#247 


Subsequence 


GB:AB018076. 


_1 


799 


1481 


#248 


CDS IHH„cds . 1 


1236 bp 


3 exons 


#245 




exon 1 


315 










exon 42 6 


687 










exon 811 


1469 










Allele GB: 


AB018075_1 


247 


194 


194 


A>G 



Allele 



source 

consequence 

GB:AB018076. 

source 

consequence 



wetSNP GB:AB018075_l.vl94.G>A 
IHH_cds.l 245 Missense 167-167 
1 248 188 188 A>G 

isSNP SNP00097225 

IHH_cds.l 245 Silent 251-251 



A>T 



GIF IHH-genomic-fwd.gif 
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INHBA 

Full name : inhibin, beta A 
Link : FL__3526170_link_cdna 

Subsequence FN: 3526170CB1 1 1620 

CDS FN:3526170CB1.1 1281 bp #250 

ORF 216 1496 
Allele FN:3526170CB1 249 607 607 

source isSNP SNP00068777 

consequence FN: 3526170CB1 . 1 250 
GIF INHBA- cdna- f wd . gi f 
Link : FL_3526170_link_genomic 

Subsequence GB:AC005027 1 199878 

Subsequence GB : AC005027_3526170CD1 16 865 

Subsequence FL_352 6170_mrna_build . 1 14163 



#249 



G>T 



Missense 



#251 

54957 #252 
55081 #253 
3 exons 



131-131 



T>K 



mRNA FL_3526170_mrna_build. 1 1620 bp . 

exon 14163 14234 
exon 16722 17252 
exon 54065 55081 
CDS GB:AC005027_3526170CD1 1281 bp 2 exons #252 

exon 16865 17252 
exon 54065 54957 
Allele GB:AC005027 251 16377 16377 A>G 

source dbSNP gnl | dbSNP | ss577365_allele 

source dbSNP gnl j dbSNP j ss588511_allele 

consequence GB: AC005027_3526170CD1 252 ' 5' 
GIF INHBA-genomic-fwd.gif 



#253 



IRS1 

Full name : Insulin receptor 

Link : IRSl_link_cdna 

Subsequence EM: S62539 

CDS EM:S62539.1 3729 bp 



substrate 1 

1 5828 
#255 



ORF 
Allele 



Allele 



Allele 



1021 4749 
EM:S62539 
source 
consequence 
EM:S62539 
source 
consequence 
EM:S62539 
source 
consequence 
GIF IRSl-cdna-fwd.gif 
Link : IRSl_link_genomic 
Subsequence 
Subsequence 
Subsequence 
CDS IRSl_cds . 1 
exon 6 80 

mRNA 

exon 



254 3388 3388 
isSNP SNP00067005 
EM:S62539.1 255 
254 3887 3887 
isSNP SNP00114530 
EM:S62539 .1 255 
254 5156 5156 
isSNP SNP00067006 
EM:S62539 .1 255 



EM:S85963 100 6251 

IRSl_cds.l 680 4411 
IRSl_mrna_build. 1 100 

3732 bp 1 exon 
4411 

IRSl_mrna__build.l 4333 bp 
100 4432 



#254 



A>G 

Missense 
A>G 

Missense 
G>T 



#256 
#257 
4432 
#257 



790-790 



956-956 



R>C 



E>G 



#258 



1 exon 



#258 
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Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 



256 850 
wetSNP 
IRSl_cds . 1 
256 1285 
wetSNP 
IRSl_cds.l 
256 1783 
wetSNP 
IRSl_cds.l 
256 2023 
wetSNP 
IRSl_cds.l 
256 2117 
wetSNP 
IRSl_cds.l 
256 2697 
wetSNP 
IRSl_cds . 1 
256 2941 
wetSNP 
IRSl_cds.l 
256 2951 
isSNP SNP00067005 
IRSl_cds.l 257 
256 2995 
wetSNP 
IRSl_cds . 1 
256 3035 
wetSNP 
IRSl_cds.l 
256 3262 
wetSNP 
IRSl_cds.l 
256 3349 
wetSNP 
IRSl_cds . 1 
256 3450 
isSNP SNP00114530 
IRSl_cds.l 257 
256 3494 
wetSNP 
IRSl_cds.l 
256 4053 
wetSNP 
IRSl_cds . 1 



850 A>G 
EM:S859 63 .V850.OT 
257 Silent 90-90 D 

1285 A>G 

EM:S85963 . vl 2 85. G> A 
257 Silent 235-235 
1783 A>G 

EM:S85963 .vl783 .T>C 
257 Silent 401-401 
2023 A>G 

EM:S85963 .v2023 . C>T 
257 Silent 481-481 
2117 OG 

EM:S85963 .v2117 .G>C 
257 Missense 513-513 
2697 A>G 

EM:S85963 . v26 97 . G>A 
257 Missense 706-706 
2941 A>G 

EM : S 8 5 9 6 3 . v2941 .T>C 
257 Silent 787-787 
2951 A>G 



Missense 791-791 
2995 A>G 

EM:S859 63 . v2 9 95. A>G 
257 Silent 805-805 
303 5 OG 

EM:S85963 .v3035 .G>C 
257 Missense 819-819 
3262 OG 

EM:S85963 .v3262 .G>C 
257 Silent 894-894 
3349 A>G 

EM:S85963 . v3349 . G>A 
257 Silent 923-923 
3450 A>G 

Missense 957-957 
3494 A>G 

EM:S85963 ,v3494.G>A 
257 Missense 972-972 
4053 A>G 

EM:S85963 . v4 053. G>A 
257 Missense 1158-1158 



H 



N 



A>P 



G>D 



H 



R>C 



G>R 



E>G 



G>R 



G>E 



GIF IRS 1- genomic- f wd . gi f 



JUN 

Full name : v-jun avian sarcoma virus 17 oncogene homolog 

Link : iJUN_link_genomic 

Subsequence JUN_cds . 1 9468 8473 #259 

Subsequence GB : AL13 6985_1 1 151212 #260 

Subsequence JUNjmrnaJauild. 1 9^8 8473 #261 
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CDS JUN__cds.l 996 bp 1 exon #25 9 

exon 9468 8473 
irtRNA JUN„mrna_build . 1 99 6 bp 1 exon #2 61 

exon 9468 8473 
GIF JUN-genomic-rev.gif 



KJ_OAll 

Full name : KIAA1253 

Link : FL_2135776_link_cdna 

Subsequence FN: 2135776CB1 

CDS FN:2135776CB1.1 1197 bp 



1 3129 
#263 

256 1452 

FN:2135776CB1 262 59 59 

source isSNP SNP00100733 

consequence FN: 2135776CB1 . 1 263 
FN:2135776CB1 262 1352 1352 

source isSNP SNP00116557 

consequence FN: 2135776CB1 . 1 263 
FN:2135776CB1 262 1477 1477 

source isSNP SNP00042286 

consequence FN: 213 5776CB1 . 1 263 
FN:2135776CB1 262 1489 1489 

source isSNP SNP00042287- 

consequence FN: 213577 6CB1 . 1 263 
FN:2135776CB1 262 1667 1667 

source isSNP SNP00011480 

consequence FN: 2135776CB1 . 1 263 
FN:2135776CB1 262 1710 1710 

source isSNP SNP00011481 

consequence FN: 2135776CB1 . 1 263 
FN:2135776CB1 262 1838 1838 

source isSNP SNP00011482 

consequence FN: 213577 6CB1 . 1 263 
FN:2135776CB1 262 2589 2589 

source isSNP SNP00003671 

consequence FN: 213577 6CB1 . 1 263 
GIF KJ_OAll-cdna-fwd.gif 
Link : FL_2135776_link_.genomic 

Subsequence GB:HS425C14 1 160203 

Subsequence GB :HS425C14_2135776CD1 55766 

Subsequence FL_2 13 5 7 7 6_mrna_build . 1 69012 

Subsequence KJ„OAll_cds . 1 55766 51052 

CDS GB:HS425C14_2135776CD1 1197 bp 9 exons 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



#262 



OG 

5' 

A>G 

Missense 
A>G 

3' 
A>G 

3' 
A>G 

3' 
A>G 

3 ' 
A>G 

3' 
A>G 
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Q>R 



#264 
42255 
40562 
#267 



#265 
#266 

#265 



mRNA 



exon 
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55731 


exon 


53861 


53692 


exon 


51441 


51362 


exon 


51118 


50981 


exon 


49268 


49099 


exon 


48965 


48875 


exon 


44476 


44332 


exon 


44215 


43985 


exon 


42390 


42255 




FL_2135776_i 



3119 bp 
212 



10 exons 



#266 
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exon 


69012 


68910 


exon 


55892 


55731 


exon 


53861 


53692 


exon 


51441 


51362 


exon 


51118 


50981 


exon 


49268 


49099 


exon 


48965 


48875 


exon 


44476 


44332 


exon 


44215 


43985 


exon 


42390 


40562 


KJ_OAll_cds . 


1 


exon 


55766 


55731 


exon 


53861 


53692 


exon 


51118 


51052 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

source 

consequence 

consequence 

GB:HS425C14 

source 

source 

consequence 



273 bp 



3 exons 



264 41092 41092 A>G 
isSNP SNP00003671 
GB:HS425C14_2135776CD1 
K0_OAll_cds . 1 267 
264 41843 41843 A>G 
isSNP SNP00011482 
GB:HS425C14_2135776CD1 
KJ_OAll__cds . 1 267 
264 41971 41971 A>G 
isSNP SNP00011481 
GB:HS425C14_2135776CD1 
KJ_OAll_cds . 1 267 
264 42014 42014 A>G 
isSNP SNP00011480 
GB:HS425C14_213 577 6CD1 
KJ_OAll_cds . 1 267 
264 42192 42192 A>G 
isSNP SNP00042287 
GB:HS425C14_2 13577 6CD1 
KJ_OAll_cds . 1 267 
264 42204 42204 A>G 
isSNP SNP00042286 
wetSNP GB:HS425C14, 
GB:HS425C14_2135776CD1 
KJ_OAll_cds . 1 267 
264 42294 42294 OG 
wetSNP GB : HS42 5C14 , 

wetSNP GB:HS425C14. 
GB:HS425C14_213 5776CD1 



#267 



265 3 ' 
3 ' 



265 
3' 



265 
3' 



265 3 • 
3' 



265 3 ' 
3' 



v42204.G>A 
265 3 ' 
3' 

v42294 .G>C 
v42294.G>C 
265 Silent 



386-386 



Allele 



S>G 



Allele 



Allele 



consequence 
GB:HS425C14 
source 
consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 



KJ_OAll_cds . 1 267 
264 42329 42329 A>G 
isSNP SNP00116557 
GB : HS425C14_213577 6CD1 



265 



Missense 



375-375 



KJ_OAll_cds.l 267 3' 

264 44297 44297 A>G 
wetSNP GB :HS425C14 . v44297 .T>C 

GB:HS425C14_2135776CD1 265 . Intron 
KJ_OAll_cds .1 267 3 ' 

264 55697 55697 A>G 
213 
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Allele 



source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 



wetSNP 



GB:HS425Cl4.v55697 . C>T 



GB:HS425Cl4_2135776CDl 
KJ_OAll_cds . 1 267 
264 68954 68954 C>G 
isSNP SNP00100733 
GB:HS425C14_2135776CD1 
KJ OAll_cds.l 267 



GIF KJ_OAll-genomic-rev.gif 



265 Intron 
Intron 



265 
5' 



KJ_OA2 

Link : K J_OA2_l ink_cdna 

Subsequence LG: 244552. 16 1 1825 #268 

Allele LG: 244552. 16 268 1476 1476 G>T 

source isSNP SNP0009 8862 



KJ_0A21 

Full name : FL project 2027624 
Link : FL_2027624_link__cdna 

Subsequence FN: 2027624CB1 

CDS FN:2027624CB1.1 1734 bp 



1 2173 
#270 

4 1737 

FN:2027624CB1 269 881 881 

source isSNP SNP00106459 

consequence FN: 2027624CB1 . 1 270 
FN:2027624CB1 269 971 971 

source isSNP SNP00075286 

consequence FN: 2027624CB1 . 1 270 
FN:2027624CB1 269 1092 1092 

source isSNP SNP00106460 

consequence FN: 2027624CB1 . 1 270 
FN:2027624CB1 269 1254 1254 

source isSNP SNP00075287 

consequence FN: 2027624CB1 . 1 270 
FN:2027624CB1 269 1374 1374 

source isSNP SNP00009699 

consequence FN: 2027624CB1 . 1 270 
FN:2027624CB1 269 1392 1392 

source isSNP SNP00097916 

consequence FN: 2027624CB1 . 1 270 
FN:2027624CB1 269 1623 1623 

source isSNP SNP00009700 

consequence FN : 2 0 2 7 6 2 4 CB 1 . 1 270 
GIF KJ_OA2 1 - c dna - fwd . g i f 
Link : FL_1250708_link„genomic 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



#269 



OG 

Missense 
A>G 

Missense 
OG 

Silent 
A>G 

Silent 
A>G 

Silent 
A>G 

Silent 
A>G 

Silent 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
mRNA 



GB:HS453C12 1 



147620 



GB:HS453C12_1394592CD1 879 67 

GB:HS453C12_2027624CD1 2 0194 

FL_1394592_mrna_build.l 87945 

FL_2027624_mrna_build.l 20197 

OA21_cds.l 20194 17050 #276 



293-293 



323-323 



363-363 



417-417 



457-457 



463-463 



540-540 



T>R 



T>I 



#271 
109084 
10528 #273 
110578 
6152 #275 



#272 
#274 



FL_2 02762 4_mrna_bu i 1 d . 1 2^2 bp 13 exons #275 
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exon 


20197 


20008 










exon 


19834 


19657 










exon 


17499 


17372 










exon 


17056 


16956 










exon 


16847 


16761 










exon 


16215 


16128 










exon 


16019 


15922 










exon 


15823 


15658 










exon 


14968 


14768 










exon 


12135 


11970 










exon 


11855 


11772 










exon 


10777 


10110 










exon 


6168 


6152 










OA21_ 


_cds . 1 


372 bp 




3 exons #276 






exon 


20194 


20008 










exon 


19834 


19657 










exon 


17056 


17050 










GB:HS453C12_ 


2027624CD1 


1734 bp 12 exons 


#273 


exon 


20194 


20008 










exon 


19834 


19657 










exon 


17499 


17372 










exon 


17056 


16956 










exon 


16847 


16761 










exon 


16215 


16128 










exon 


16019 


15922 










exon 


15823 


15658 










exon 


14968 


14768 










exon 


12135 


11970 










exon 


11855 


11772 










exon 


10777 


10528 










ile 


GB:HS453C12 


271 


10642 10642 A>G 








source 


isSNP 


SNP00009700 








source 


wetSNP 


GB:HS453C12 


.vl0642 


. A>G 




source 


wetSNP 


GB:HS453C12 


.V10642 


. A>G 




consequence 


OA21_cds.l 276 3' 








consequence 


GB :HS453C12_2027624CD1 


273 


Silent 



Y 

Allele 



Allele 



T 

Allele 



Allele 



540-540 



GB:HS453C12 271 11206 11206 A>G 

source dbSNP gnl | dbSNP | ss979258_allele 

consequence OA21__cds . 1 276 3' 

consequence GB : HS453C12__2027624CD1 273 Intron 

GB:HS453C12 271 11999 11999 A>G 

source isSNP SNP00009699 

source wetSNP GB:HS453C12 .vll999 . OT 

source wetSNP GB:HS453C12 .vll999 . C>T 

consequence OA21_cds.l 276 3' 

consequence GB :HS453C12_2027624CD1 273 Silent 



457-457 



GB:HS453C12 

source 

consequence 

consequence 

GB:HS453C12 

source 

consequence 



271 13494 13494 A>G 

isSNP SNP00095042 

OA21_cds.l 276 3' 

GB:HS453C12_2027624CD1 

271 14913 14913 C>G 

isSNP SNP00106460 

OA21_cds.l 276 3' 
215 



273 



Intron 
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L 

Allele 



T>I 



consequence GB :HS453C12_2027624CDl 273 Silent 363-363 

GB:HS453C12 271 15723 15723 A>G 

source isSNP SNP00075286 

consequence OA21__cds.l 276 3' 

consequence GB :HS453C12_2027624CD1 273 Missense 323-323 



GIF KJ_OA21-genomic-rev.gif 



KJ_OA29 

Link : K J_OA2 9_1 ink_cdna 



Subsequence LG: 


199489.1 


1 3318 


#277 


Allele 


LG:199489 


.1 


277 


544 544 


A>G 




source 




isSNP 


SNP00005297 




Allele 


LG:199489 


.1 


277 


695 695 


A>G 




source 




isSNP 


SNP00121995 




Allele 


LG:199489 


.1 


277 


971 971 


A>G 




source 




isSNP 


SNP00047679 




Allele 


LG:199489 


.1 


277 


1312 1312 


A>G 




source 




isSNP 


SNP00005298 




Allele 


LG:199489 


.1 


277 


1445 1445 


A>G 




source 




isSNP 


SNP00027647 




Allele 


LG:199489 


.1 


277 


2370 2370 


A>G 




source 




isSNP 


SNP00005297 




Allele 


LG:199489 


.1 


277 


2521 2521 


A>G 




source 




isSNP 


SNP00121995 




Allele 


LG:199489 


.1 


277 


2797 2797 


A>G 




source 




isSNP 


SNP00047679 




Allele 


LG:199489 


.1 


277 


3138 3138 


A>G 




source 




isSNP 


SNP00005298 




Allele 


LG:199489 


.1 


277 


3271 3271 


A>G 




source 




isSNP 


SNP00027647 





KJ_OA3 

Link : KJ_OA3_link_cdna 

Subsequence LG: 153511.1 1 1628 #278 

Allele LG:153511.1 278 395 395 A>G 

source isSNP SNP00003503 

Allele LG:153511.1 278 1101 1101 A>G 

source isSNP SNP00113687 



KJ_OA31 

Link : KJ_OA31_link_cdna 

Subsequence LG: 200972 .2 

Allele LG:200972.2 279 

source isSNP 
Allele LG:200972.2 279 

source isSNP 
Allele LG:200972.2 279 



1 2192 
366 366 
SNP00099556 
836 836 
SNP00015954 



1037 



1037 
216 



#279 
OG 

A>G 

A>G 
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source 




isSNP 


SNP00015955 




Allele 


LG:200972 


.2 


279 


1361 1361 


A>G 




source 




isSNP 


SNP00000598 




Allele 


LG:200972 


.2 


279 


1697 1697 


A>G 




source 




isSNP 


SNP00000599 




Allele 


LG:200972 


.2 


279 


1975 1975 


A>G 




source 




isSNP 


SNP00067907 




Allele 


LG:200972 


.2 


279 


2027 2027 


A>G 




source 




isSNP 


SNP00067908 





KJ_OA33 

Full name : cardiotrophin-like cytokine 

Link : FL_1676240_link_genomic 

Subsequence GB : AC005849_1 1 169144 #280 

Subsequence KJ_OA33__cds . 1 151862 143455 #281 

Subsequence KJ_OA33„mrna„build . 1 1519 07 142489 



#282 



CDS KJ_OA33_cds . 1 
exon 151862 



mRNA 



exon 
exon 

exon 
exon 
exon 



145945 

143949 

KJ_OA33. 

151907 

145945 

143949 



678 bp 
151847 
145779 
143455 
_mrna_bu i 1 d . 1 
151847 
145779 
142489 



3 exons 



1689 bp 



#281 



3 exons 



#282 



GIF KJ_OA33-genomic-rev.gif 



KJ_OA39 

Link : KJ_OA39_link_cdna 

Subsequence LG: 293953.1 1 940 #283 

Allele LG:293953.1 283 679 679 G>T 

source isSNP SNP00110603 



KJ_OA6 

Full name : FL project 2840746 
Link : FL„8 1849 8__1 ink_genomi c 

Subsequence GB:AC005598 1 190000 #284 

Subsequence GB : AC00559 8_2840746CDl 132700 133368 

Subsequence FL_2 84074 6_mrna_bu i Id . 1 132672 135584 

CDS GB:AC005598_2840746CD1 669 bp 1 exon #285 

exon 132700 133368 

FL_2840746__mrna„build. 1 1087 bp 2 exons #286 

132672 133391 
135218 135584 

GB:AC005598 284 132689 132689 A>G 

isSNP SNP00005520 

GB:AC005598_2840746CDl' 285 5' 
284 132843 132843 A>G 

wetSNP GB:AC00559 8 .V132843 .OT 

285 Silent 



#285 
#286 



mRNA 

exon 
exon 
Allele 



Allele 



source 
consequence 
GB:AC005598 
source 
consequence 



GB:AC005598_2840746CD1 
217 



48-48 S 
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Allele 



Allele 



Allele 



Allele 



G>V 
Allele 



GBrAC005598 

source 

consequence 

GBrAC005598 

source 

consequence 

GB:AC005598 

source 

consequence 

GB:AC005598 

source 

consequence 



GB:AC005598 
source 
consequence 
GIF KJ_OA6-genomic-fwd. 



•284 132878 132878 A>G 

wetSNP GB : AC 005598. vl 32878 . G>A 

GBrAC005598_2840746CDl 285 Missense 60-60 R>H 

284 132951 132951 A>G 

wetSNP GB:AC005598.vl32951 .C>T 

GBr AC005598_2840746CD1 285 Silent 84-84 F 

284 132967 132967 A>G 

wetSNP GB:AC005598.vl32967 . C>T 

GB:AC005598_2840746CD1 285 Missense 90-90 P>S 

284 133103 133103 G>T 

wetSNP GB : AGO 0559 8 . vl33103 . G>T 

GBrAC005598_2840746CDl 285 Missense 135-135 

284 133481 133481 A>G 

wetSNP GB : AC00559 8 .V133481 . C>T 

GBr AC005598_2840746CD1 285 3' 
.gif 



K0_oagba3 

Link : KJ_oagba3_link_cdna 

Subsequence LG: 215642. 2 



Allele 



Allele 



LG:215642.2 
source 
LG:215642.2 
source 



287 
isSNP 
287 
isSNP 



1 2849 
1475 1475 
SNP00041601 
1963 1963 
SNP00010951 



#287 
A>G 

A>G 



LIF 

Full name : leukemia inhibitory factor 

Link : LIF_link_cdna 

Subsequence GB : LIF 1 3848 #288 

CDS GBrLIF.l 609 bp #289 



ORF 


45 653 






Allele 


GB : LIF 


288 1183 1183 


G>T 




source 


isSNP SNP00036337 






consequence 


GBrLIF.l 289 


3' 


Allele 


GB : LIF 


288 1572 1572 


A>G 




source 


isSNP SNP00099092 






consequence 


GBrLIF.l 289 


3 ' 


Allele 


GB : LIF 


288 1996 1996 


OG 




source 


isSNP SNP00099093 






consequence 


GBrLIF.l 289 


3' 


Allele 


GB : LIF 


288 2062 2062 


G>T 




source 


isSNP SNP00099094 






consequence 


GBrLIF.l 289 


3' 


Allele 


GB : LIF 


288 2404 2404 


A>G 




source 


isSNP SNP00099095 






consequence 


GBrLIF.l 289 


3 ' 


Allele 


GB : LIF 


288 3156 3156 


A>G 




source 


isSNP SNP0003633 8 






consequence 


GBrLIF.l 289 


3' 


Allele 


GB : LIF 


288 3582 3582 


A>G 
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source isSNP SNP00008778 

consequence GB:LIF.l 289 3' 

GIF LIF-cdna-fwd . gif 
Link : OSM_link_genomic 

Subsequence GB:AC004264 1 47188 #290 

Subsequence LIF_cds.l 11398 8354 #291 

Subsequence LIF_rarna_build . 1 11442 5156 #292 

CDS LIF_cds.l 609 bp 3 exons #291 



exon 


11398 11380 






exon 


9636 9458 






exon 


8764 8354 






mRNA 


LIF._mrna_build.l 3851 bp 


3 exons 


exon 


11442 11380 






exon 


9636 9458 






exon 


8764 5156 






Allele 


GB: AC004264 


290 5420 5420 


A>G 




source 


isSNP SNP00008778 






consequence 


LIF_cds.l 291 


3 ' 


Allele 


GB:AC004264 


290 5846 5846 


A>G 




source 


isSNP SNP00036338 






consequence 


LIF_cds.l 291 


3 ' 


Allele 


GB:AC004264 


290 6598 6598 


A>G 




source 


isSNP SNP00099095 






consequence 


LIF_cds.l 291 


3' 


Allele 


GB:AC004264 


290 6940 6940 


G>T 




source 


isSNP SNP00099094 






consequence 


LIF_cds .1 2 91 


3 ' 


Allele 


GB:AC004264 


290 7006 7006 


OG 




source 


isSNP SNP00099093 






consequence 


LIF_cds.l 2 91 


3 ' 


Allele 


GB:AC004264 


290 7435 7435 


A>G 




source 


isSNP SNP00099092 






consequence 


LIF__cds.l 291 


3' 


Allele 


GB: AC004264 


290 7824 7824 


G>T 




source 


isSNP SNP00036337 






consequence 


LIF_cds.l 291 


3 ' 



GIF LIF-genomic-rev.gif 



LUM 

Full name : lumican 
Link : FL_2676170_link_genomic 

Subsequence GB : AC007115_1 1 180821 

Sub s equ en c e GB : AC 0 0 7 1 1 5_1_3 128106 CDl 

Subsequence FL_3128106_mrna_build . 1 84719 

mRNA FL_3128106„mrna„build. 1 1926 bp 

exon 84719 84998 
exon 87396 88278 
exon 92077 92839 
CDS GB:AC007115_1_3128106CD1 1020 bp 

exon 87417 88278 
exon 92077 92234 
Allele GB:AC007115_1 293 89050 89050 A>G 

source dbSNP gnl | dbSJg| ss85253 0_allele 



#293 

87417 92234 #294 
92839 #295 
3 exons #29 5 



2 exons 



#294 
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source dbSNP gnl | dbSNP | ss897123_allele 

consequence GB r AC007115_1_3128106CD1 294 Intron 

Allele GB:AC007115_1 293 89249 89249 A>G 

source dbSNP gnl | dbSNP | ss855039_allele 

consequence GB : AC007115_l_3128106CDl 294 Intron 

GIF LUM-genomic-fwd.gif 



METTLl 

Full name : methyl transferase- like 1 
Link : METTLl_link_cdna 

Subsequence GB:Y18643_JL 1 



1292 



CDS GB:Y18643 1.1 



831 bp 



ORF 
Allele 



Allele 



49 879 

GBrYl8643_l 

source 

consequence 

GB:Y18643_1 

source 

consequence 



296 345 345 
isSNP SNP00098761 
GB:Y18643_1.1 
296 919 919 
isSNP SNP00003825 
GB:Y18643_1.1 



#296 
#297 

A>G 

297 
A>G 

297 



Silent 



99-99 P 



GIF METTLl-cdna- f wd . gi f 



MMP1 

Full name : matrix metalloproteinase 1 
Link : MMPl__link_cdna 

Subsequence EMrHSCOLLl 1 1970 



Link : MMPl„link__genomic 



Subsequence 
Subsequence 
Subsequence 



GB:HSU78045 
MMPl_cds . 1 



1 81826 
11905 4225 



CDS MMP1_ 


_cds . 1 


1410 : 


exon 


11905 


11801 


exon 


11314 


11070 


exon 


10976 


10828 


exon 


10603 


10478 


exon 


9421 


9266 


exon 


9105 


8988 


exon 


6551 


6418 



MMPl_mrna_build. 1 11973 
10 exons 



#298 



Allele 


EM : HS COLLI 


298 


383 383 


A>G 




source 


isSNP 


SNP00009627 




Allele 


EMrHSCOLLl 


298 


714 714 


A>G 




source 


isSNP 


SNP00037857 




Allele 


EMrHSCOLLl 


298 


745 745 


A>G 




source 


isSNP 


SNP00037858 




Allele- 


EM r HSCOLL1 


298 


1522 1522 


A>G 




source 


isSNP 


SNP00009628 




Allele 


EM r HSCOLL1 


298 


1541 1541 


A>G 




source 


isSNP 


SNP00009629 




Allele 


EMrHSCOLLl 


298 


1662 1662 


A>G 




source 


isSNP 


SNP00009630 




Allele 


EMrHSCOLLl 


298 


1747 1747 


A>G 




source 


isSNP 


SNP00009631 





#299 
#300 

3733 #301 
#300 
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exon 


5308 5146 












exon 


4619 4516 












exon 


4334 4225 












xaRHA 


MMPl_mrna_bui Id .1 1970 


bp 


10 exons 


#301 




exon 


11973 11801 












exon 


11314 11070 












exon 


10976 10828 












exon 


10603 10478 












exon 


9421 9266 












exon 


9105 8988 












exon 


6551 6418 












exon 


5308 5146 












exon 


4619 4516 












exon 


4334 3733 












Allele 


GB:HSU78045 


299 3956 


3956 


A>G 








source 


isSNP SNP00009631 










consequence 


MMPl_cds . 1 


300 


3 ' 






Allele 


GB:HSU78045 


299 4041 


4041 


A>G 








source 


isSNP SNP00009630 










consequence 


MMPl_cds . 1 


300 


3 ' 






Allele 


GB:HSU78045 


299 4162 


4162 


A>G 








source 


isSNP SNP00009629 










consequence 


MMPl„cds . 1 


300 


3' 






Allele 


GB:HSU78045 


299 4181 


4181 


A>G 








source 


isSNP SNP00009628 










consequence 


MMPl_cds . 1 


300 


3 ' 






Allele 


GB:HSU78045 


299 4517 


4517 


A>G 








source 


wetSNP 


GB:HSU78045 .v4517 


. A>G 






consequence 


MMPl_cds . 1 


300 


Silent 


433- 


433 


Allele 


GB:HSU78045 


299 4661 


4664 


CATG>CG 








source 


wetSNP 


GB:HSU78045.v4661 


. CATG>CG 




consequence 


MMPl_cds . 1 


300 


Intron 






Allele 


GB:HSU78045 


299 4677 


4677 


A>G 








source 


wetSNP 


GB:HSU78045.v4677 


. G>A 






consequence 


MMPl_cds . 1 


300 


Intron 






Allele 


GB:HSU7 8045 


299 5198 


5198 


A>G 








source 


wetSNP 


GB:HSU78045 ,v5198 


. A>G 






consequence 


MMPl_cds . 1 


300 


Missense 


382- 


•382 


Allele 


GB:HSU78045 


299 6586 


6586 


A>G 








source 


wetSNP 


GB:HSU78045.v6586 


.T>C 






consequence 


MMPl_cds . 1 


300 


Intron 






Allele 


GB:HSU78045 


299 9056 


9056 


A>G 








source 


wetSNP 


GB:HSU78045.v9056 








consequence 


MMPl_cds . 1 


300 


Silent 


277- 


■277 


Allele 


GB:HSU78045 


299 9120 


9120 


A>G 








source 


wetSNP 


GB:HSU78045.v9120 


. A>G 






consequence 


MMPl_cds . 1 


300 


Intron 






Allele 


GB:HSU78045 


299 9126 


9126 


A>G 








source 


wetSNP 


GB:HSU78045 .v9126 


. G>A 






consequence 


MMPl_cds . 1 


300 


Intron 






Allele 


GB:HSU78045 


299 9205 


9205 


A>G 








source 


wetSNP 


GB:HSU78045 .v9205 


.T>C 






consequence 


MMPl_cds . 1 


300 


Intron 






Allele 


GB:HSU78045 


299 9247 


9247 


A>G 








source 


wetSNP 


GB:HSU78045.v9247 


.T>C 





221" 
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TABLE 1 (Cont.) 



Allele 



Allele 



Allele 



consequence 

GB:HSU78045 

source 

consequence 

GB:HSU78045 

source 

consequence 

GB:HSU78045 

source 

source 

consequence 



3 00 Intron 
93 65 G>T 

GB:HSU78045.v9365.G>T 



300 
9370 



MMPl_cds . 1 
299 9365 
wetSNP 
MMPl_cds . 1 
299 9370 
isSNP SNP00037858 
MMPl_cds.l 3 00 
299 11105 11105 
isSNP SNP00009627 

wetSNP GB:HSU78045 .vlll05.C>T 

MMPl_cds.l 300 Silent 105-105 



Missense 
A>G 

Missense 
A>G 



228-228 



226-226 



H>N 



L>P 



GIF MMPl-genomic-rev.gif 



MMP13 

Full name : MMP13 

Link : MMP13_link_genomic 

Subsequence MMPl3_cds.l 



141623 



Subsequence 
CDS MMP13_cds . 1 

exon 

exon 

exon 

exon 

exon 

exon 

exon 
Allele 



GB:AP000789_1 1 
957 bp 7 exons 



159614 
201766 
#302 



#302 
#303 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



141629 
141956 
144063 
146009 
147078 
157208 
159509 

GB:AP000789_ 
source 
consequence 
GB:AP000789_ 
source 
consequence 
GB:AP000789_ 
source 
consequence 
GB:AP000789_ 
source 
consequence 
GB:AP000789_ 
source 
consequence 
GB : AP0007 89_ 
source 
consequence 
GB:AP000789_ 
source 
consequence 



141779 
142081 
144224 
146126 
147211 
157367 
159614 
1 3 03 

wetSNP 
MMP13_cds.l 
1 303 
wetSNP 
MMP13_cds . 1 
1 303 
wetSNP 
MMPl3_cds . 1 
1 303 
wetSNP 
MMP13_cds . 1 
1 303 
wetSNP 
MMPl3_cds . 1 
1 303 
wetSNP 
MMPl3_cds . 1 
1 303 
wetSNP 
MMPl3_cds . 1 



141614 OQ 
l.vl41614.C>G 



141614 
GB:AP000789_ 
302 5 ' 

141875 141875 G>T 

GB:AP0007 89_1 .vl41875.C>A 
3 02 Intron 

147095 147095 A>G 

GB:AP000789_1 .vl47 09 5.A>G 
302 Missense 192-192 
157231 157231 OG 

GB:AP000789_l.vl57231.G>C 
302 Missense 239-239 
157325 157325 A>G 

GB: AP000789_1 . vl57325 . A>G 
302 Missense 270-270 
159631 159631 A>G 

GB:AP0007 89_l.vl59 631.OT 
302 3 ' 

159644 159644 OG 

GB:AP000789_1 .vl59644.G>C 
302 3 ' 



H>R 



G>R 



D>G 



GIF MMP13-genomic-fwd.gif 



MMP14 

Full name : MMP14 



222 
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TABLE 1 (Cont.) 



Link : MMP14_link_cdna 

Subsequence GB : HUMMTMMP 1 



3403 



CDS GB : HUMMTMMP . 1 



1749 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



112 1860 

GB : HUMMTMMP 

source 

consequence 

GB : HUMMTMMP 

source 

consequence 

GB : HUMMTMMP 

source 

consequence 

GB : HUMMTMMP 

source 

consequence 

GB : HUMMTMMP 

source 

consequence 

GB : HUMMTMMP 

source 

consequence 

GB : HUMMTMMP 

source 



304 133 133 
isSNP SNP00107954 
GB: HUMMTMMP. 1 
304 580 580 
isSNP SNP00107955 
GB: HUMMTMMP. 1 
304 888 888 
isSNP SNP00093383 
GB: HUMMTMMP. 1 
304 966 966 
isSNP SNP00055171 
GB : HUMMTMMP . 1 
304 1243 1243 
isSNP SNP00107956 
GB: HUMMTMMP. 1 
304 1264 1264 
isSNP SNP00107957 
GB : HUMMTMMP . 1 
304 1944 1944 
isSNP SNP00060446 
GB : HUMMTMMP . 1 



consequence 
GIF MMP14-cdna-fwd.gif 
Link : MMP14„1 ink_genomi c 

Subsequence MMP14_cds.l 132034 

Subsequence GB : ALl 3 344 8_3 1 

Subsequence MMP14_mrna_build . 1 

CDS MMP14_cds.l 1749 bp 10 exons 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



132034 
136706 
137128 
137625 
138472 
138925 
139586 
139845 
140466 
140923 



132141 
136854 
137250 
137932 
138633 
139085 
139724 
139995 
140581 
141254 



#304 
#305 

A>G 

305 
A>G 

305 
OG 

305 
A>G 

305 
A>G 

305 
OG 

305 
A>G 

305 



141254 
173805 
131922 
#306 



Missense 



8-8 



S>P 



Silent 



Silent 



Silent 



Missense 



Missense 



157-157 



259-259 



285-285 



378-378 



385-385 



K>E 



D>H 



#306 
#307 
142801 



#308 



MMPl 4_mrna_bu i 1 d . 1 



3408 bp 



10 exons 



#308 



131922 132141 

136706 136854 

137128 137250 

137625 137932 

138472 138633 

138925 139085 

139586 139724 

139845 139995 

140466 140581 

140923 142801 

GB:AL133448_3 307 132055 132055 

source isSNP SNP00107954 

consequence MMP14_cds . 1 306 Missense 8-8 
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A>G 



P>S 
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Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GIF MMP14 



GB:AL133448_ 
source 
consequence 
GB:AL133448_ 
source 
consequence 
GB:AL133448_ 
source 
consequence 
GB:AL133448_ 
source 
source 
consequence 
GB:AL133448_ 
source 
consequence 
GB:AL133448_ 
source 
consequence 
GB:AL133448_ 
source 
consequence 
GB:AL133448. 
source 
consequence 
GB:AL133448. 
source 
consequence 
- g enomi c - f wd 



3 307 137049 137051 TTA>TA 

wetSNP GB:AL133448_3 .vl37049 . TTA>TA 

MMPl4_cds .1 306 Intron 

3 307 137713 137713 A>G 

isSNP SNP00107955 

MMP14_ cds.l 306 
3 3 07 

wetSNP 

MMPl4_cds . 1 
3 307 



Silent 157 
138406 138406 
GB : AL133448_3 .v!38406 . 
3 06 Intron 
138560 138560 



-157 

A>G 

G>A 



isSNP SNP00093383 



wetSNP 

MMPl4_cds . 1 
3 307 

wetSNP 

MMPl4_cds . 1 
3 307 

wetSNP 

MMPl4_cds . 1 
3 307 

wetSNP 

MMPl4_cds.l 
_3 307 

wetSNP 

MMPl4_cds . 1 
J3 3 07 

isSNP SNP00060446 

MMPl4_cds . 1 306 
gif 



GB : AL133448_3 .vl38560 . 
306 Silent 259 
138653 138653 
GB:AL133448_3 .vl38653 . 
306 Intron 
139639 139639 
GB:AL133448_3 .vl39639 
306 Missense 355 
139981 139981 
GB:AL133448_3 .vl39981. 
306 Silent 429 
139986 139986 
GB:AL133448_3 .vl39986 
306 Missense 431 
141337 141337 



C>G 

C>G 
-259 

A>G 

G>A 

A>G 
G>A 
-355 

A>G 
OT 
-429 

A>G 

G>A 
.-431 

A>G 



M>I 



R>H 



MMP2 

Link : MMP2_1 ink_cdna 

Subsequence GB : HSMMPM2 



CDS GB : HSMMPM2 . 1 



2010 



ORF 
Allele 



Allele 



49 2058 
GB : HSMMPM2 
source 
consequence 
GB : HSMMPM2 



1 
bp 



681 



3530 



681 



source 
consequence 

Al 1 el e GB : HSMMPM2 

source 
consequence 

Allele GB : HSMMPM2 

source 
consequence 

Allele GB : HSMMPM2 

source 
consequence 

GIF MMP2 - cdna- f wd .gif 
Link : MMP2_1 ink_g enomi c 

Subsequence MMP2_cds . 1 



309 

isSNP SNP00100004 
GB : HSMMPM2 . 1 
309 1835 1835 
isSNP SNP00100005 
GB : HSMMPM2 . 1 
309 1851 1851 
isSNP SNP00075435 
GB:HSMMPM2.1 
309 2717 2717 
isSNP SNP00024650 
GB:HSMMPM2.1 
309 2922 2922 
isSNP SNP00024651 
GB : HSMMPM2 . 1 



175558 
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#309 
#310 

A>G 

310 
A>G 

310 
G>T 

310 
A>G 

310 
OG 

310 



156463 



Silent 



Missense 



Missense 



211-211 



596-596 



601-601 



D>G 



F>L 



#311 
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TABLE 1 (Cont.) 



Subsequence 
Subsequence 
CDS MMP2_cds.l 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:AC012182_3 1 190117 #312 

MMP2_mrna_build.l 175606 155007 
2010 bp 10 exons #311 

175397 
164289 
163515 
161727 
161211 
160039 
159540 
158549 
158282 
156463 



#313 



175558 
164437 
163643 
162034 
161372 
160292 
159678 
158699 
158397 
156902 

MMP2_mrna_build.l 3514 bp 



10 exons 



#313 



175606 175397 
164437 164289 
163643 163515 
162034 161727 
161372 161211 
160292 160039 
159678 159540 
158699 158549 
158397 158282 
156902 155007 

GB:AC012182_J3 312 155598 155598 OG 

source isSNP SNP00024651 

consequence MMP2__cds . 1 311 3' 

GB:AC012182_3 312 155804 155804 A>G 

source isSNP SNP00024650 

consequence MMP2_cds . 1 311 3' 

GB:AC012182_3 312 156670 156670. G>T 

source isSNP SNP00075435 

consequence MMP2_cds . 1 311 Missense 601-601 
GB:AC012182_3 312 156686 156686 A>G 

source isSNP SNP00100005 

consequence MMP2_cds . 1 311 Missense 596-596 
GB:AC012182_3 312 161842 161842 A>G 

source isSNP SNP00100004 



F>L 



D>G 



consequence 
GB: AC012182. 
source 



MMP2_cds . 1 
3 312 
wetSNP 



311 Silent 211-211 P 

163660 163660 A>G 

GB:AC012182_3 .v!63660.G>A 



consequence MMP2„cds . 1 
GIF MMP2-genomic-rev.gif 



311 



Intron 



MMP3 

Full name : matrix metalloproteinase 3 

Link : MMP3__link_cdna 

Subsequence EM : HSSTROMR 1 1801 #314 

Allele EM : HSSTROMR 314 331 331 A>G 

source isSNP SNP00011525 

Allele EM: HSSTROMR 314 3 82 3 82 A>G 

source isSNP SNP00113489 

Allele EM : HSSTROMR 314 713 713 A>G 
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isSNP SNP00015044 

314 976 976 

isSNP SNP00054705 

314 1129 1129 

isSNP SNP00011527 



source 
Allele EM:HSSTROMR 

source 
Allele EM : HSSTROMR 

source 

Link : MMP3_link_genomic 

Subsequence EM:HSU78045 100 

Subsequence MMP3_link_cds . 1 

Subsequence MMP3_mrna_build . 1 

CDS MMP3_link_cds.l 1434 bp 



A>G 



A>G 



81925 
57437 
57480 



#315 

50020 #316 
49696 #317 



10 exons 



#316 



exon 


57437 


57333 


exon 


56806 


56562 


exon 


56469 


56321 


exon 


56182 


56057 


exon 


54487 


54323 


exon 


54146 


54002 


exon 


53137 


53004 


exon 


52604 


52445 


exon 


51295 


51192 


exon 


50120 


50020 



mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



MMP3_mrna_build. 1 1801 bp 
57480 57333 



10 exons 



#317 



56806 
56469 
56182 
54487 
54146 
53137 
52604 



56562 
56321 
56057 
54323 
54002 
53004 
52445 



51295 51192 
50120 49696 
EM:HSU78045 
source 
consequence 
EM:HSU78045 
source 
consequence 
EM:HSU7 8045 
source 
consequence 
EM:HSU78045 
source 



315 52375 52375 A>G 
wetSNP EM : HSU7 8045. v5 2375. T>C 

MMP3_link_cds.l 316 Silent 400-400 
315 52411 52411 A>G 

wetSNP EM : HSU78 045. v5 2411. G>A 

MMP3_link_cds.l 316 Silent 388-388 
315 52489 52489 A>G 
wetSNP EM:HSU78045 .v52489 . G>A 

MMP3_link_cds.l 316 Silent 362-362 
315 52 527 5253 0 GAGT>GT 
wetSNP EM : HSU7 8045. v5 2527. GAGT>GT 

consequence MMP3_link_cds . 1 316 Intron 
EM:HSU78045 315 52586 52586 A>T 

wetSNP EM : HSU7 8045. v5 2586. T>A 

MMP3_link_cds.l 316 Intron 
315 53771 53771 A>T 
we t SNP EM : HSU7 8045. v5 3771. T>A 

MMP3„1 ink__cds .1 316 Intron 
315 54077 54077 OG 
wetSNP EM:HSU78045 .v54077 . OG 

MMP3_link_cds . 1 316 Intron 
315 54187 54187 A>G 
wetSNP EM : HSU7 8045. v5 4187. OT 

MMP3_link_cds^ 316 Intron 



T 



source 

consequence 

EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

consequence 
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Allele 



Allele 



Allele 



Allele 



Allele 



EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

source 

consequence 

EM:HSU78045 

source 

consequence 



315 54402 54402 A>G 
wetSNP EM : HSU7 8045, 

MMP3__1 ink_cds .1 316 
315 56119 56119 A>G 
wetSNP EM:HSU78045 
MMP3_link_cds.l 316 
315 56507 56507 C>G 
wetSNP EM : HSU78045 

MMP3_1 ink__cds .1 316 
315 56525 56525 A>G 
isSNP SNP00011525 
wetSNP EM : HSU7 8045 

MMP3_link_cds . 1 316 
315 56680 56680 A>G 
we t SNP EM : H SU7 8045 



MMP3_link__cds.l 



316 



V54402.OT 
Intron 

v56119 .OT 
Intron 

v56507.G>C 
Silent 



v56525.G>A 
Silent 

V56680.OT 
Missense 



102-102 



96-96 D 



45-45 E>K 



T 



GIF MMP3-genomic-rev.gif 



MMP9 

Full name : matrix metalloproteinase 

Link : MMP9_link_cdna 

Subsequence FN: 522678CB1 

CDS FN:522678CB1.1 2124 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



1 2348 
#319 
33 2156 

FN:522678CB1 318 308 308 

source isSNP SNP00101082 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 413 413 

source isSNP SNP00101083 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 534 534 

source isSNP SNP00101084 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 591 591 

source isSNP SNP00101085 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 719 719 

source isSNP SNP00101086 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 748 748 

source isSNP SNP00021346 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 868 868 

source isSNP SNP00002987 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 1604 1604 

source isSNP SNP00021347 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 1853 1853 

source isSNP SNP00002988 

consequence FN: 522678CB1 . 1 319 

FN:522678CB1 318 2159 2159 

source isSNP SNP0 00 62663 

227 



#318 



A>G 

Silent 
A>G 

Silent 
A>G 

Missense 
A>G 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 

Missense 
A>G 

Silent 
G>T 

Silent 
A>G 



92-92 K 



127-127 



168-168 



187-187 



229-229 



239-239 



279-279 



524-524 



607-607 



N 



I>V 



L>F 



R>H 



Q>R 
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consequence FN: 522678CB1 


.1 


319 


3 ' 


Allele 


FN:522678CB1 318 


2302 


2302 


A>G 




source isSNP SNP00021348 








consequence FN:522678CB1 


.1 


319 


3' 


GIF MMP9-cdna-fwd.gif 








Link : MMP9_ 


1 ink_genomi c 








Subsequence 


GB : HUMIVCOL0 1 


1 


764 


#320 


Subsequence 


GB:HUMIVCOL02 


865 


1117 


#321 


Subsequenc 


e 


GB:HUMIVCOL03 


1218 


1386 


#322 


Subsequenc 


e 


GB:HUMIVCOL04 


1487 


1635 


#323 


Subsequence 


GB:HUMIVCOL05 


1736 


1929 


#324 


Subsequenc 


e 


GB:HUMIVCOL06 


2030 


2223 


#325 


Subsequence 


GB:HUMIVCOL07 


2324 


2520 


#326 


Subsequence 


GB:HUMIVCOL08 


2621 


2796 


#327 


Subsequence 


GB:HUMIVCOL09 


2897 


3196 


#328 


Subsequence 


GB : HUMIVCOLl 0 


3297 


3456 


#329 


Subsequence 


GB: HUMIVCOLl 1 


3557 


3727 


#330 


Subsequence 


GB:HUMIVCOLl2 


3828 


3951 


#331 


Subsequence 


GB:HUMIVCOLl3 


4052 


4371 


#332 


Subsequence 


MMP9_cds.l 619 


4180 


#333 




Subsequence 


MMP9_mrna„build. 1 


587 


4371 


#334 


CDS MMP9_cds . 1 


2124 bp 13 exons 


#333 




exon 


619 


756 








exon 


875 


1107 








exon 


1228 


1376 








exon 


1497 


1625 








exon 


1746 


1919 








exon 


2040 


2213 








exon 


2334 


2510 








exon 


2631 


2786 








exon 


2907 


3186 








exon 


3307 


3446 








exon 


3567 


3717 








exon 


3838 


3941 








exon 


4062 


4180 








mRNA 


MMP9„ 


_mrna_build. 1 2348 bp 


13 exons 


exon 


587 


756 








exon 


875 


1107 








exon 


1228 


1376 








exon 


1497 


1625 








exon 


1746 


1919 








exon 


2040 


2213 








exon 


2334 


2510 








exon 


2631 


2786 








exon 


2907 


3186 








exon 


3307 


3446 








exon 


3567 


3717 








exon 


3838 


3941 








exon 


4061 


4371 








Allele 


GB:HUMIVCOL01 320 


677 


677 


A>G 



#334 



source wetSNP GB : HUMIVCOL01 . v677 . C>T 

consequence MMP9_cds . 1 333 Missense 20-20 A>V 

Allele GB:HUMIVCOL02 321 148 148 A>G 

source isSNP SNP00101082 

consequence MMP9_cds . 1 333 Silent 92-92 K 
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Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GIF MMP9- 



A>G 



GB:HUMIVCOL04 323 49 49 

source isSNP SNP00101085 

consequence MMP9__cds . 1 333 Missense 

GB:HUMIVCOL05 324 48 48 A>G 

source isSNP SNP00101086 

consequence MMP9_cds . 1 333 Silent 

GB:HUMIVCOL05 324 77 77 A>G 

source isSNP SNP00021346 

consequence MMP9_cds . 1 333 Missense 

GB:HUMIVCOL09 328 252 252 A>G 

source isSNP SNP00021347 

consequence MMP9_cds . 1 333 Silent 

GB:HUMIVCOLll 330 81 81 G>T 

source isSNP SNP00002988 



333 
87 



consequence MMP9_cds . 1 
GB : HUMIVCOL13 332 
source wetSNP 
consequence MMP9__cds . 1 
GB:HUMIVCOLl3 332 
source wetSNP 
consequence MMP9_cds - 1 
GB : HUMIVCOLl 3 332 
source isSNP SNP00021348 

consequence MMP9_cds . 1 333 
g enomi c - f wd . gi f 



Silent 
87 A>G 
GB : HUMIVCOL13 . v87 . G>A 
333 Silent 694- 
132 132 A>G 
GB : HUMIVCOLl 3 . vl3 2 . OT 
333 3' 



187-187 



229-229 



239-239 



524-524 



607-607 



L>F 



R>H 



694 



V 



274 



274 



A>G 



MSF 

Full name : megakaryocyte stimulating factor 

Link : MSF_link_cdna 

Subsequence GB : HM_0 05807 1 5041 

CDS GB:NM_005807.1 4215 bp #336 



34 4248 
GB:NM_005807 
source isSNP 
consequence GB :NM_ 
GB:NM_005807 
source isSNP 
consequence GB : NM_ 
GB:NM_005807 
source isSNP 
consequence GB : 
GB:lSnyi_005807 
source isSNP 
consequence GB : NM_ 
GB:NM_005807 
source isSNP 
consequence GB : NM_ 
GIF MSF-cdna-fwd.gif 
Link : MSF_link_genomic 

Subsequence MSF_cds . 1 

Subsequence MSF_cds . 2 

Subsequence MSF_cds . 3 

Subsequence MSF_cds . 4 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



335 



SNP00064566 
005807 .1 
335 2650 
SNP00108532 
005807.1 
335 3171 
SNP00009620 
005807 .1 
335 4187 
SNP00061665 
005807.1 
335 4760 
SNP00009621 
.005807.1 



181003 
181003 
181003 
181003 



336 
2650 

336 
3171 

336 
4187 

336 
4760 

336 



#335 



1011 1011 A>G 



Silent 
A>G 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 



326-326 



873-873 



1046-1046 



1385-1385 



K 



P>S 



A>V 



229 



197905 
197905 
197905 
197905 



#337 
#338 
#339 
#340 
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TABLE 1 (Cont.) 



Subsequence 
Subsequence 



CDS MSF_cds.3 3 936 


bp 


exon 


181003 


181078 


exon 


184218 


184340 


exon 


185719 


185838 


exon 


190445 


193267 


exon 


193920 


193997 


exon 


195161 


195297 


exon 


195567 


195723 


exon 


196302 


196499 


exon 


196896 


197021 


exon 


197808 


197905 


mRNA 


M S F__mrn a_bu i 1 d . 1 


exon 


180982 


181078 


exon 


184218 


184340 


exon 


185719 


185838 


exon 


188235 


188384 


exon 


188921 


189049 


exon 


190445 


193267 


exon 


193920 


193997 


exon 


195161 


195297 


exon 


195567 


195723 


exon 


196302 


196499 


exon 


196896 


197021 


exon 


197808 


198681 



GB:AL133553_7 1 214019 #341 

MSF_mrna_build.l 180982 198681 #342 

10 exons #339 



5012 bp 



12 exons 



#342 



CDS MSF_cds.4 3 813 

exon 181003 

exon 185719 

exon 190445 

exon 193920 

exon 195161 

exon 195567 

exon 196302 

exon 196896 

exon 197808 

CDS MSF_cds.l 4215 

exon 181003 

exon 184218 

exon 185719 

exon 188235 

exon 188921 

exon 19 0445 

exon 193920 

exon 195161 

exon 195567 

exon 19 6302 

exon 196896 

exon 197808 

CDS MSF_cds.2 4092 

exon 181003 

exon 185719 

exon 188235 

exon 188921 

exon 190445 



bp 9 
181078 
185838 
193267 
193997 
195297 
195723 
196499 
197021 
197905 

bp 12 
181078 
184340 
185838 
188384 
189049 
193267 
193997 
195297 
195723 
196499 
197021 
197905 

bp 11 
181078 
185838 
188384 
189049 
193267 



exons 



#340 



exons 



#337 



#338 



230 
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TABLE 1 (Cont.) 



exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



193920 
195161 
195567 
196302 
196896 
197808 

GB:AL133 553_ 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553_ 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553_ 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553_ 

source 

consequence 

consequence 

consequence 

consequence 

GB : ALl 3 3 5 5 3_ 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553_ 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553_ 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553. 

source 

consequence 

consequence 

consequence 

consequence 

GB:AL133553. 



193997 
195297 
195723 
196499 
197021 
197905 
7 341 
wetSNP 
MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 
wetSNP 
MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 
wetSNP 
MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
.7 341 



190505 190505 G>T 

GB:AL133553_7 .vl90505.A>C 

339 Missense 127-127 

340 Missense 86-86 D>A 

337 Missense 220-220 

338 Missense 179-179 
190559 190559 A>G 
GB:AL133553_7 .vl90559 . OT 

339 Missense 145-145 

340 Missense 104-104 

337 Missense 238-238 

338 Missense 197-197 
190755 190755 A>G 
GB:AL133553_7 . vl90755 . G>A 

339 Silent 210-210 

340 Silent 169-169 

337 Silent 303-303 

338 Silent 262-262 
190824 190824 A>G 



isSNP SNP00064566 



MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 



339 
340 
337 
338 

192463 



Silent 
Silent 
Silent 
Silent 



233-233 
192-192 
326-326 
285-285 



192463 



isSNP SNP00108532 



MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 



339 
340 
337 
338 

192984 



isSNP SNP00009620 



MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 
wetSNP 
MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
1 341 
wetSNP 
MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
1 341 



339 
340 
337 
338 



Missense 
Missense 
Missense 
Missense 

192984 

Silent 
Silent 
Silent 
Silent 



A>G 

780-780 
739-739 
873-873 
832-832 

A>G 

953-953 
912-912 
1046-1046 
1005-1005 
193235 193235 A>G 

GB:AL133553_7 .vl93235.A>G 

339 Missense 1037-1037 

340 Missense 996-996 

337 Missense 1130-1130 

338 Missense 1089-1089 
193258 193258 A>G 
GB:AL133553_7 .vl93258.A>G 

339 Missense 1045-1045 

340 Missense 1004-1004 

337 Missense 1138-1138 

338 Missense 1097-1097 
196691 196691 G>T 



D>A 

D>A 
D>A 



T>M 
T>M 
T>M 
T>M 



K 
K 
K 
K 



K 
K 
K 
K 



P>S 
P>S 
P>S 
P>S 



P 
p 
p 
p 



N>S 
N>S 
N>S 
N>S 



M>V 
M>V 
M>V 
M>V 
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TABLE 1 (Cont.) 



Allele 



source 
consequence 
consequence 
consequence 
consequence 



isSNP SNP00023429 



MSF_cds.3 339 
MSF_cds.4 340 
MSF__cds.l 337 
MSF_cds.2 33 8 
GB : AL133553__7 341 197844 

source isSNP SNP00061665 



Intron 
Intron 
Intron 
Intron 



197844 



consequence 
consequence 
consequence 
consequence 



Allele GB:AL133 553_7 



MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
341 



339 
340 
337 
338 

198417 



source 



isSNP SNP00009621 



consequence MSF_cds • 3 
consequence MSF_cds . 4 
consequence MSF_cds . 1 
consequence MSF__cds . 2 
GIF MSF-genomic-fwd.gif 



339 
340 
337 
338 



Missense 
Missense 
Missense 
Missense 

198417 

3 ' 
3' 
3' 
3' 



A>G 

1292-1292 
1251-1251 
1385-1385 
1344-1344 
A>G 



A>V 
A>V 
A>V 
A>V 



NCOR2 

Full name : nuclear receptor co-repressor 2 
Link : NCOR2_l ink_cdna 

Subsequence GB:AF125672 1 8686 



CDS GB:AF125672 .1 

ORF 
Allele 



7524 bp 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



157 7680 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB: AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 



165 



343 165 

isSNP SNP00035702 

GB:AF125672.1 

343 618 618 

isSNP SNP00105557 

GB:AF125672.1 

343 • 2859 2859 

isSNP SNP00101011 

GB:AF125672.1 

343 4728 4728 

isSNP SNP00075034 

GB:AF125672.1 

343 4749 4749 

isSNP SNP00069757 

GB:AF125672 . 1 

343 4957 4957 

isSNP SNP00101012 

GB:AF125672 .1 

343 5085 5085 

isSNP SNP00075035 

GB:AF125672 . 1 

343 5100 5100 

isSNP SNP00075036 

GB:AF125672.1 

343 5221 5221 

isSNP SNP00012485 

GB:AF125672.1 

343 7405 7405 
232 



#343 
#344 

G>T 

344 
A>G 

344 
A>G 

344 
A>G 

344 
A>G 

344 
A>G 

344 
A>G 

344 
A>G 

344 
A>G 

344 
A>G 



Silent 



Silent 



Silent 



Silent 



Silent 



Missense 



Silent 



Silent 



Missense 



3-3 G 



154-154 



901-901 



1524-1524 



1531-1531 



1601-1601 



1643-1643 



1648-1648 



1689-1689 



Y>H 



N 



T>A 
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TABLE 1 (Cont.) 



source 
consequence 

Allele GB:AF125672 
source 
consequence 

Allele GB:AF125672 
source 
consequence 

Allele GB:AF125672 
source 
consequence 

Allele GB:AF125672 
source 
consequence 

GIF NCOR2-cdna-fwd.gif 



isSNP SNP00015859 

GB:AF125672.1 344 

343 7431 7431 A>G 
isSNP SNP00101013 

GB:AF125672.1 344 

343 7751 7751 A>G 
isSNP SNP00101014 

GB:AF125672.1 344 

343 8597 8597 A>G 
isSNP SNP00062569 

GB-.AF125672.1 344 

343 8602 8602 A>G 
isSNP SNP00012487 

GB:AF125672.1 344 



Missense 



Silent 



2417-2417 



2425-2425 



P>S 



NOG 

Full name : NOG 

Link : NOG_link_genomic 

Subsequence GB:AC005553 1 179651 #345 

Subsequence NOG_cds . 1 146202 145504 #346 

Subsequence NOG_mrna_build . 1 147012 145466 #347 

CDS NOG_cds.l 699 bp 1 exon #346 

exon 146202 145504 
xnRNA NOG_mrna_Jbuild.l 1547 bp 1 exon #347 

exon 147012 145466 
Allele GB:AC005553 345 145585 145585 A>G 

source wetSNP GB : AC 005553 .vl45585 .G>A 

consequence NOG_cds . 1 346 Silent 206-206 
GIF NOG-genomic-rev.gif 



NOTCH 3 

Link : NOTCH3_link_cdna 

Subsequence GB : NOTCH3 

CDS GB:NOTCH3.1 6966 bp 



8091 
#349 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



79 7044 

GB :NOTCH3 

source 

consequence 

GB : NOTCH3 

source 

consequence 

GB : NOTCH3 

source 

consequence 

GB : NOTCH3 

source 

consequence 

GB : NOTCH 3 

source 

consequence 



348 1218 1218 

isSNP SNP00116668 

GB:NOTCH3.1 349 

348 1565 1565 

isSNP SNP00116669 

GB:NOTCH3.1 349 

348 2616 2616 

isSNP SNP00116670 

GB:NOTCH3.1 349 

348 4520 4520 

isSNP SNP00116671 

GB:NOTCH3.1 349 

348 5740 5740 

isSNP SNP00054178 

GB:NOTCH3.1 349 
233 



#348 



A>G 

Silent 
A>G ■ 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 

Missense 



380-380 



496-496 



846-846 



1481-1481 



1888-1888 



P>L 



D>G 



F>L 
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TABLE 1 (Cont.) 



Allele GB : NOTCH 3 

source 

consequence 
Allele GB : NOTCH3 

source 

consequence 
Allele GB:NOTCH3 

source 

consequence 
Allele GB : NOTCH 3 

source 

consequence 
Allele GB : NOTCH 3 

source 

consequence 
Allele GB : NOTCH3 

source 

consequence 
Allele GB : NOTCH3 

source 

consequence 
GIF NOTCH3-cdna-fwd.gif 
Link : NOTCH3_link_genomic 

Subsequence NOTCH3_cds . 1 

Subsequence GB : ACO 0466 3_1 

CDS NOTCH3_cds.l 6846 bp 



348 6355 6355 
isSNP SNP00037780 
GB:NOTCH3.1 349 
348 6516 6516 
isSNP SNP00054179 
GB:NOTCH3.1 349 
348 6746 6746 
isSNP SNP00048081 
GB:NOTCH3.1 349 
348 7733 7733 
isSNP SNP00037781 
GB:NOTCH3.1 349 
348 7881 7881 
isSNP SNP00062225 
GB:NOTCH3.1 349 
348 7914 7914 
isSNP SNP00066446 
GB:NOTCH3.1 349 
348 8023 8023 
isSNP SNP00 066447 
GB:NOTCH3.1 349 



40735 
1 



A>G 

Missense 
A>G 

Silent 
A>G 

Missense 
A>G 

3 ' 
A>G 

3' 
A>G 

3 ' 
A>G 



2093-2093 



2146-2146 



2223-2223 



A>T 



V>A 



3819 
41150 



32 exons 



#350 
#351 
#350 



exon 


40733 


40657 


exon 


35676 


35534 


exon 


35455 


35117 


exon 


35024 


34902 


exon 


34814 


34581 


exon 


32585 


32430 


exon 


32331 


32146 


exon 


31505 


31392 


exon 


31151 


31038 


exon 


30495 


30262 


exon 


30145 


30035 


exon 


28836 


28644 


exon 


28565 


28414 


exon 


28176 


28063 


exon 


27607 


27452 


exon 


24958 


24733 


exon 


24319 


24118 


exon 


23985 


23838 


exon 


23413 


23229 


exon 


22653 


22521 


exon 


22439 


22182 


exon 


22098 


21980 


exon 


21247 


20682 


exon 


17557 


17225 


exon 


13982 


13828 


exon 


13710 


13488 


exon 


13327 


13243 


exon 


10568 


10406 


exon 


9248 


8944 



234 
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TABLE 1 (Cont.) 



exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



8672 8525 
5719 5622 
4871 3819 

GB:AC004663_1 351 
source wetSNP 



3796 3796 A>T 
GB:AC004663_1 .v3796 ,A>T 



consequence NOTCH3_cds . 1 3 50 

GB:AC004663_1 351 4117 4117 

source isSNP SNP00048081 

consequence NOTCH3_cds . 1 3 50 

GB:AC004663_JL 351 4347 4347 

source isSNP SNP00054179 

consequence NOTCH3__cds . 1 350 

GB:AC004663_1 351 4508 4508 

source isSNP SNP00037780 

consequence NOTCH3_cds . 1 
GB:AC004663_1 351 5727 

source wetSNP GB:AC004663. 

consequence NOTCH3_cds . 1 350 

GB:AC004663_1 351 5943 5943 



3 ' 
A>G 

Missense 
A>G 

Silent 
A>G 



350 
5727 



Missense 
A>G 

,l.v5727.A>G 
Intron 
A>G 

source dbSNP gnl | dbSNP| ss730238_allele 

consequence NOTCH3_cds . 1 350 Intron 

GB:AC004663__1 351 17519 17519 A>G 

source isSNP SNP00116671 

consequence NOTCH3_cds . 1 350 Missense 

GB:AC004663_1 351 18749 18749 A>G 



2183-2183 



2106-2106 



2053-2053 



A>V 



A>T 



1441-1441 



D>G 



dbSNP gnl | dbSNP | ss680542_allele 
dbSNP gnl|dbSNP|ssll43619_allele 
dbSNP gnl|dbSNP|ss372819_allele 
NOTCH3_cds .1 350 Intron 

22353 22353 A>G 
GB : AC 0046 6 3_1 .v22353 . OT 
1 350 Missense 

23922 23922 OG 
GB:AC004663_1 .V23922 .OG 
350 Missense 
24045 A>G 



1 351 
wetSNP 
NOTCH3_cds 
_1 351 
wetSNP 

NOTCH3_cds . 1 
1 351 24045 



source 
source 
source 
consequence 
GB:AC004663. 
source 
consequence 
GB:AC004663. 
source 
consequence 
GB:AC004663. 
source 
consequence 
GB:AC004663. 
source 
consequence 
GB:AC004663_1 351 
source wetSNP 
consequence NOTCH3_cds 
GB:AC0046 63__1 351 
source wetSNP 
consequence NOTCH3_cds . 1 
GB:AC004663 1 351 29997 



1143-1143 



980-980 



V>M 



A>P 



wetSNP GB : AC0046 63_1 . v24045 .T>C 

N0TCH3_cds .1 350 Intron 

1 351 27480 27480 A>G 

isSNP SNP00116670 

NOTCH3_cds.l 350 Silent 

28173 28173 A>G 
GB : AC0 0466 3_1 . v2 8173 . C>T 
1 350 Missense 

28749 28749 A>G 
GB:AC004663_l.v28749 .C>T 
350 Missense 
29997 C>G 



806-806 



727-727 



640-640 



C 



R>H 



R>H 



sour c e wetSNP GB : AC 0 0 4 6 6 3_1 . v2 9 9 9 7 . G>C 

consequence N0TCH3_cds . 1 3 50 Intron 

Allele GB:AC004663_1 351 32482 32482 A>G 

source isSNP SNP00116668 

consequence NOTCH3_cds . 1 350 Silent 

GIF NOTCH3-genomic-rev.gif 



340-340 



235 
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NPR2 

Full name : Atrionatriuretic Peptide Receptor Type B 
Link : NPR2__link_cdna 

Subsequence GB : HUMGUANCYC 1 4081 #352 

CDS GB: HUMGUANCYC. 2 3144 bp #353 

ORF 651 3794 
Allele GB: HUMGUANCYC 352 2222 2222 A>G 

source isSNP SNP00028343 

consequence GB : HUMGUANCYC . 2 353 Silent 524-524 
GIF NPR2-cdna-fwd.gif 



OGN 

Full name : osteoglycin 

Link : OGN_link_cdna 

Subsequence GB:HSM801395 
CDS GB:HSM801395.1 441 bp 



ORF 
Allele 



Allele 



1 441 

GB:HSM801395 354 64 

source isSNP SNP00100803 

consequence GB :HSM801395 . 1 
GB:HSM801395 354 909 

source isSNP SNP00011097 

consequence GB : HSM8 0139 5.1 
GIF OGN- cdna- fwd . gi f 
Link : OGN_link_genomic 

Subsequence OGN_cds . 2 48897 32003 

Subsequence GB : AL3 5492 4_2 1 

Subsequence OGN_mrna_build . 2 50083 

2726 bp 



2101 
#355 

64 

355 
909 

355 



mRNA 


OGN_mrn a__bu i 1 d . 2 


exon 


50083 


49983 


exon 


48969 


48721 


exon 


46672 


46579 


exon 


38619 


38461 


exon 


35431 


35229 


exon 


32679 


32584 


exon 


32173 


30350 


CDS OGN_cds . 2 


9 00 bp 


exon 


48897 


48721 


exon 


46672 


46579 


exon 


38619 


38461 


exon 


35431 


35229 


exon 


32679 


32584 


exon 


32173 


32003 


Allele 


GB:AL354924_2 



6 exons 



357 31535 
source isSNP SNP00011097 

consequence OGN_cds . 2 356 
Allele GB:AL354924_2 357 35339 

source isSNP SNP00100803 

consequence OGN_cds.2 35 6 
GIF OGN- genomic -rev . gif 



#354 



A>G 



Missense 
A>G 



22-22 L>F 



#356 
192427 
30350 #358 
7 exons 



#357 



#358 



#356 



31535 A>G 



3 ' 

35339 



A>G 



Missense 



175-175 



L>F 
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OMD 

Full name : osteomodulin 
Link : OMD_JLink_cdna 

Subsequence GB : OMD 

CDS GB:OMD.l 1266 bp 



2263 
#360 



#359 



ORF 


101 1366 








Allele 


GB : OMD 


359 


159 159 


OG 




source 


isSNP 


SNP00023658 






consequence 


GB : OMD 


.1 360 


Missense 


Allele 


GB : OMD 


359 


762 762 


A>G 




source 


isSNP 


SNP00023659 






consequence 


GB : OMD 


.1 360 


Missense 


Allele 


GB : OMD 


359 


1969 1969 


A>G 




source 


isSNP 


SNP00023660 






consequence 


GB : OMD 


.1 360 


3' 


Allele 


GB : OMD 


359 


2071 2071 


G>T 




source 


isSNP 


SNP00106046 






consequence 


GB : OMD 


.1 360 


3 ' 


GIF OMD- 


cdna-fwd. gif 









20-20 OS 



221-221 



S>N 



Link : FL_1258977_link_genomic 

Subsequence GB:AB009589 1 12414 #361 

Subsequence GB : AB009589_1258977CD1 8540 

Subsequence FL__1258977_mrna_build . 1 1685 



mRNA FL_1258977_mrna_build. 1 

exon 1685 1892 

exon 8524 9479 

exon 10624 11855 
CDS GB:AB009589_1258977CD1 1263 bp 



2396 bp 



10946 #362 
11855 #363 
3 exons 



#363 



2 exons 



#3 62 



exon 
exon 
Allele 



Allele 



S>N 
Allele 



Allele 



Allele 



Allele 



8540 9479 

10624 10946 

GB:AB009589 

source 

consequence 

GB:AB009589 

source 

consequence 

GB:AB009589 

source 

consequence 

GB:AB009589 

source 

consequence 

GB:AB009589 

source 

consequence 

GB:AB009589 

source 

consequence 



361 8598 8598 OG 
isSNP SNP00023658 
GB:AB009589_1258977CD1 362 
361 9201 9201 A>G 
isSNP SNP00023659 
GB:AB009589 1258977CD1 362 



Missense 



Missense 



20-20 OS 



221-221 



361 10042 10042 A>G 

dbSNP gnl|dbSNP|ss312223_allele 

GB:AB009589_1258977CD1 362 Intron 

361 10596 10596 A>G 

wetSNP GB: AB009 589 . vl0596 . A>G 



GB:AB009589_1258977CD1 3 62 
361 11552 11552 A>G 
isSNP SNP00023660 

GB:AB009589_1258977CD1 362 
361 11654 11654 G>T 
isSNP SNP00106046 

GB: AB009589_1258977CD1 3 62 



Intron 



GIF OMD-genomic~fwd.gif 



PDCD6IP 

Full name : programmed cell death 6 -interacting protein 

237 
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Link : PDCD6IP_link__cdna 

Subsequence GB:AF151793 1 



3221 



CDS GB:AF151793 .1 



2607 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



127 2733 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 

GB:AF151793 

source 

consequence 



364 1051 1051 
isSNP SNP00029958 
GB:AF151793.1 
364 1258 1258 
isSNP SNP0010879 0 
GB:AF151793 .1 
364 1298 1298 
isSNP SNP00108791 
GB:AF151793 .1 
364 1695 1695 
isSNP SNP00093444 
GB:AF151793 .1 
364 2230 2230 
isSNP SNP00121559 
GB:AF151793.1 
364 2315 2315 
isSNP SNP00006604 
GB:AF151793.1 
364 2386 2386 
isSNP SNP00029960 
GB:AF151793.1 
364 2421 2421 
isSNP SNP00121560 
GB:AF151793 .1 



#364 
#365 

A>G 

365 
A>G 

365 
G>T 

365 
A>G 

365 
A>G 

365 
A>G 

365 
A>G 

365 
A>G 

365 



Missense 



Missense 



Missense 



Silent 



Missense 



Missense 



Missense 



Silent 



309-309 



378-378 



391-391 



523-523 



702-702 



730-730 



754-754 



765-765 



T>A 



V>I 



L>W 



R>G 



L>S 



P>S 



GIF PDCD6IP-cdna-fwd.gif 



PDNP1 

Full name : phosphodiesterase I (nucleotide 
to mouse Ly-41 antigen) ) 
Link : PDNPl_link_cdna 

Subsequence EM : HSAUTOTAX 1 

CDS EM : HSAUTOTAX . 2 2748 bp 



pyrophosphatase I (homologous 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



50 2797 

EM : HSAUTOTAX 366 342 

source isSNP SNP00025434 

consequence EM : HSAUTOTAX . 2 
EM: HSAUTOTAX 366 696 

source isSNP SNP00075872 

consequence EM : HSAUTOTAX . 2 
EM: HSAUTOTAX 3 66 1682 

source isSNP SNP00025435 

consequence EM : HSAUTOTAX . 2 
EM : HSAUTOTAX 3 6 6 1789 

source isSNP SNP00004604 

consequence EM : HSAUTOTAX . 2 
EM: HSAUTOTAX 366 2398 

source isSNP SNP00122211 

consequence EM : HSAUTOTAX . 2 
EM : HSAUTOTAX 366 2539 

238 



3231 
#367 

342 

367 
696 



367 
2398 



#366 



A>G 

Missense 
A>G 



367 Missense 

1682 A>G 

367 Missense 

1789 A>G 



Silent 
G>T 



3 67 Silent 
2539 A>G 



98-98 A>V 



216-216 



545-545 



580-580 



783-783 



T>I 



P>S 



H 



V 
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source isSNP SNP00004605 

consequence EM : HSAUTOTAX . 2 367 Silent 830-830 

Allele EM : HSAUTOTAX 366 2681 2681 G>T 

source isSNP SNP00059344 

consequence EM : HSAUTOTAX . 2 367 Silent 878-878 

GIF PDNPl-cdna-fwd.gif 
Link : PDNPl_link_genomic 



Subsequence 


IN: 


9 8092911313498 4217 


4948 


#368 




Subsequence 


IN: 


98061109562226435 


5050 


5980 


#369 


Subsequence 


IN: 


98092910591328158 


3611 


4115 


#370 


Subsequence 


IN: 


98092911013628201 


100 


699 


#371 


Subsequence 


IN: 


9 8092911024828217 


2027 


2526 


#372 


Subs equ enc e 


IN: 


98092911044928261 


3068 


3 5.09 


#373 


Subsequence 


IN: 


98092911065328292 


801 


1418 


#374 


Subsequence 


IN: 


98092913141116289 


6183 


6572 


#375 


Subsequence 


IN: 


98111010592914993 


1520 


1926 


#376 


Subsequence 


IN: 


98111011021915028 


2628 


2967 


#377 


Allele IN: 


980929 


10591328158 370 


232 


232 


A>G 



source isSNP SNP00025435 

Allele IN:98092913141116289 375 

source isSNP SNP00059344 



189 



189 



G>T 



PLA2G2A 

Full name : phospho lipase A2 , group IIA 

Link : PLA2G2A„link_cdna 

Subsequence GB : HUMRASFAB 1 

CDS GB : HUMRASFAB. 1 43 5 bp 



ORF 
Allele 



136 570 
GB: HUMRASFAB 
source 
consequence 



378 267 
isSNP SNP00010003 
GB : HUMRASFAB . 1 



Allele 



854 
#379 

267 

379 
800 



#378 



A>G 

Silent 
A>G 



44-44 Y 



GB: HUMRASFAB 378 800 

source isSNP SNP00021612 

consequence GB : HUMRASFAB . 1 379 3' 

GIF PLA2G2A-cdna-fwd.gif 
Link : PLA2 G2 A_l ink„genomi c 

Subsequence PLA2G2A_cds . 1 51704 48629 #380 

Subsequence PLA2G2 A__mrna_build . 1 52537 48418 

Subsequence GB : AL358253_1 1 180550 

Subsequence LG: 474322 . 13_mrna_build. 1 52786 

Subsequence PLA2G2A_cds . 2 51704 50985 #384 

mRNA LG: 474322 . 13_mrna_build . 1 1028 bp 

exon 52786 52511 
exon 51810 51665 
exon 51455 51311 
exon 51052 50946 
exon 48771 48418 
CDS PLA2G2A_cds.l 435 bp 4 exons 

exon 51704 51665 
exon 51455 51311 
exon 51052 50946 
exon 48771 48629 
CDS PLA2G2A_cds.2 108 bp 2 23 |Xons 



#381 
#382 
48418 



#383 



5 exons 



#383 



#380 



#384 
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exon 
exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



779 bp 



51704 51665 
51052 50985 
PLA2G2A_mrna_build. 1 
52537 52511 
51810 51665 
51455 51311 
51052 50946 
48771 48418 
GB:AL358253_1 382 
source isSNP SNP00010003 

consequence PLA2G2A_cds . 1 
consequence PLA2G2A_cds . 2 
GB:AL358253_1 382 52584 

source isSNP SNP00021611 

consequence PLA2G2A_cds . 1 
consequence PLA2G2 A__cds . 2 



5 exons 



#381 



51364 51364 A>G 



380 
384 
52584 

380 
384 



Silent 
Intron 
C>G 

5' 
5' 



44-44 Y 



GIF PL A2 G2 A- g enomi c - rev . gi f 



PPP1R5 

Full name : protein phosphatase 1, 
Link : PPPlR5_link_cdna 

Subsequence GB:Y182 07_1 1 



regulatory (inhibitor) subunit 5 



1158 



CDS GB:Y18207_1.1 



954 bp 



ORF 
Allele 



Allele 



92 1045 
GB:Y18207_1 
source 
consequence 
GB:Y18207_1 
source 
consequence 
GIF PPPlR5-cdna- fwd . gif 
Link : PPPlR5_link_g enomi c 

Subs equence GB : AGO 2 0 6 9 1_2 

Subsequence 



385 571 571 
isSNP SNP00041149 
GB:Y18207_1.1 
385 1096 1096 
isSNP SNP00060710 
GB:Y18207_1.1 



PPP1R5 mrna__build. 1 



#385 
#386 

A>G 

386 
G>T 

386 



152048 
103997 



Silent 



160-160 



#3 87 
107245 



#388 



Subsequence PPPlR5_cds . 1 106194 

CDS PPPlR5_cds.l 93 9 bp 1 exon 

exon 106194 107132 
PPPlR5_mrna_build . 1 
103997 104103 
106193 107245 
GB:AC020691_2 387 
source wetSNP 
consequence PPPlR5_cds 
GB:AC020691_2 387 
source isSNP SNP00041149 

consequence PPPlR5__cds . 1 
GB:AC020691_2 387 107183 

source isSNP SNP00060710 

consequence PPPlR5_cds . 1 389 
GIF PPP1R5 -genomic- fwd .gif 



mRNA 

exon 
exon 
Allele 



Allele 



Allele 



1160 bp 



106523 

GB:AC020691_ 
1 389 
106658 



389 



107132 
#389 

2 exons 



106523 



#389 



#388 



G>T 



2 .V106523 . T>G 
Missense 110- 
106658 A>G 



110 



Silent 
107183 



155-155 
G>T 



D>E 



E 



240 
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PRELP 

Full name : proline arginine-rich end leucine-rich repeat protein 
L ink : PRELP„1 ink„c dna 

Subsequence GB:HSU29 089 1 1560 



CDS GB:HSU29089 .1 



129 1277 
GB:HSU29089 
source 
consequence 
GB:HSU29089 
source 
consequence 
GIF PRELP-cdna-fwd.gif 
Link : PRELP_link_genomic 



ORF 
Allele 



Allele 



1149 bp 

390 1170 1170 
isSNP SNP00001359 
GB:HSU29089 .1 
390 1489 1489 
isSNP SNP00001361 
GB:HSU29089.1 



#390 
#391 



G>T 

391 
G>T 

391 



Missense 



348-348 



N>H 



Subsequence 
Subsequence 
Subsequence 
CDS PRELP_cds.l 

exon 82496 

exon 

mRNA 

exon 
exon 
exon 
Allele 



PRELP_cds . 1 
GB:AC022000_ 



82496 



86192 
1 



#392 
154681 
75139 86474 
#392 



86017 



Allele 



PRELP__mrna__bui Id . 1 
1149 bp 2 exons 

83468 
86192 

PRELP_mrna_build.l 1559 bp 3 exons 

75139 75250 
82480 83468 
86017 86474 

GB:AC022000_1 393 86085 86085 G>T 

source isSNP SNP00001359 

consequence PRELP__cds . 1 392 Missense 
GB: AC022000_1 393 86404 86404 G>T 



#393 
#394 



#394 



348-348 



N>H 



source isSNP SNP00001361 

consequence PRELP_cds .1 392 
GIF PRELP-genomi c- f wd . gi f 



PRSS11 

Full name : serine protease . 

Link : FL_178733 5_link_cdna 

Subsequence FN: 1787335CB1 

CDS FN:1787335CB1.1 1443 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



49 1491 

FN:1787335CB1 395 150 

source isSNP SNP000689 99 

consequence FN: 1787335CB1 . 1 
FN:1787335CB1 395 156 

source isSNP SNP00117078 

consequence FN: 1787335CB1'. 1 
FN:1787335CB1 395 914 

source isSNP SNP00120314 

consequence FN: 1787335CB1 . 1 
FN:1787335CB1 395 1321 

source isSNP SNP00105589 

consequence FN: 1787335CB1 . 1 
FN:1787335CB1 395 1521 

source isSNP SNP00105590 

consequence FN: 1787335CB1^1 



2054 
#396 

150 

396 
156 

396 
914 

396 
1321 

396 
1521 

396 



#395 



A>G 

Silent 
G>T 

Silent 
A>G 

Missense 
C>G 

Missense 
A>G 



34-34 A 



36-36 G 



289-289 



425-425 



Q>R 



A>P 
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GIF PRSSll-cdna-fwd.gif 
Link : FL_1787335_link_genomic 

Subsequence GB : AF157 623_1_17 8733 5CB1 

Subsequence GB : AF157623_1 1 79597 

Subsequence FL_1787335_mrna_build. 1 17478 

CDS GB:AF157623_1_1787335CD1 1443 bp 

exon 17526 17997 
44869 
45494 
62755 
63272 
64640 
66023 
67922 
70213 

FL_1787335_mrna_build.l 2039 bp 
17478 17997 
44869 
45494 
62755 
63272 
64640 
66023 
67922 
70761 

GB:AF157623_1 398 17627 17627 

source isSNP SNP00068999 

consequence GB : AF157 623_1_17 87335CD1 



17526 70213 #397 
#398 

70761 #399 

9 exons #397 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

iriRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



44770 
45290 
62561 
63240 
64526 
65966 
67827 
70045 



44770 
45290 
62561 
63240 
64526 
65966 
67827 
70045 



9 exons 



#399 



34-34 A 
Allele 



36-36 G 
Allele 



Allele 



Allele 



Allele 



251-251 
Allele 



Allele 



Allele 



GB:AF157623_1 398 17633 17633 

source isSNP SNP00117078 

consequence GB : AFl57623_l_1787335CDl 



A>G 



G>T 



A>G 



GB:AF157623_1 398 21721 21721 

source isSNP SNP00101582 

consequence GB : AF157 6 23_1_1 7873 3 5CD1 
GB:AF157623_1 398 35790 35790 A>G 

source isSNP SNP00049308 

consequence GB : AF157 623_1_1787335CD1 
GB:AF157623_1 398 44762 44762 

source wetSNP GB:AF157623_ 

consequence GB : AF157 623__1_17 87335CD1 
GB:AF157623_1 398 45470 45470 

source wetSNP GB:AF157623_ 

consequence GB : AF157 623_1_17 87335CD1 
I 

GB:AF157623_1 398 45587 45587 

source wetSNP GB:AF157623_ 

consequence GB : AF157623 J__1787335CD1 
GB:AF157623„1 398 47792 47792 

source isSNP SNP00105588 

consequence GB : AF157623_1_1787335CD1 
GB:AF157623_1 398 47834 47834 

source isSNP SNP00120312 

consequence GB : AF1 576 23_1__17 8733 5CD1 

242 



397 Silent 



397 Silent 



397 Intron 



3 97 Intron 

G>T 

l.v44762.G>T 

397 Intron 

A>G 

1.V45470.OT 

397 Silent 

A>G 

1.V455 87 .OT 

397 Intron 

A>G 



A>G 



397 Intron 



39 7 Intron 
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Allele 



Allele 



Allele 



Allele 



289-289 
Allele 



Allele 



GB:AF157623_ 

source 

consequence 

GB:AF157623_ 

source 

consequence 

GB:AF157623_ 

source 

consequence 

GB:AF157623_ 

source 

consequence 

Q>R 

GB:AF157623_ 

source 

consequence 

GB:AF157623. 

source 

consequence 



1 398 47913 47913 A>G 

isSNP SNP00120313 

GB:AF157623_1_1787335CD1 397 Intron 

,1 398 62541 62541 A>G 

wetSNP GB:AF157623_1 .v62541.G>A 

GB:AF157623_l_1787335CDl . 397 Intron 

JL 398 62545 62545 A>G 

wetSNP GB:AF157623__l.v62545 .G>A 

GB : AF1 5 7 6 2 3_1_1 78733 5CD1 397 Intron 

JL 398 62649 62649 A>G 

isSNP SNP00120314 

GB:AF157623_1_1787335CD1 397 Missense 

1 398 63355 63360 TGTTTT>TT 

wet SNP GB : AFl 5 7 6 2 3__1 . v6 3 3 5 5 . TGTTTT>TT 

GB:AF157623_1_1787335CD1 397 Intron 

1 398 70243 70243 A>G 

isSNP SNP00105590 

GB:AF157623_1_1787335CD1 397 3' 



GIF PRSSll-genomic-fwd.gif 



PTGS2 

Full name : Prostaglandin-endoperoxide Synthase 2 
Link : PTGS2_link_cdna 

Subsequence EM : HSCYCLOX 



Allele EM: HSCYCLOX 400 

source isSNP 

Allele EM: HSCYCLOX 400 

source isSNP 

Allele EM: HSCYCLOX 400 

source isSNP 

Allele EM: HSCYCLOX 400 

source isSNP 

Allele EM: HSCYCLOX 400 

source is SNP 
Link : PTGS2_link_genomic 



1 3387 
403 403 
SWP00046167 
880 880 
SNP00076329 
2033 2033 
SNP00076330 
2300 2300 
SNP00046168 
2983 2983 
SNP00046169 



#400 
OG 



Subsequence 
Subsequence 
Subsequence 
CDS PTGS2_cds.l 

exon 1925 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

mRNA 

exon 
exon 



2777 
3014 
3811 
4670 
5584 
5787 
6315 
7103 
7737 



G>T 



A>G 



A>G 



A>G 



GB : HUMPTGS2 101 11097 
PTGS2__cds.l 1925 8146 
PTGS2_mrna__build . 1 
1815 bp 10 exons 

1976 
2893 
3157 
3954 
4851 
5667 
6033 
6601 
7250 
8146 

PTGS2_mrna_build.l 3373 bp 

1828 1976 
2777 2893 

243 



#401 
#402 
1828 
#402 



9607 #403 



10 



#403 
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exon 


3014 


3157 


exon 


3811 


3954 


exon 


4670 


4851 


exon 


5584 


5667 


exon 


5787 


6033 


exon 


6315 


6601 


exon 


7103 


7250 


exon 


7737 


9607 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB: HUMPTGS 2 

source 

consequence 

GB : HUMPTGS 2 

source 

consequence 

GB: HUMPTGS 2 

source 

consequence 

GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS 2 

source 

consequence 

GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS 2 

source 

consequence 

GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS2 

source 

consequence 
GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS2 

source 

c on s equ enc e 

GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS2 

source 

consequence 

GB : HUMPTGS 2 

source 

consequence 

GB : HUMPTGS 2 

source 



401 3050 
wetSNP 
PTGS2_cds . 1 
401 3090 
wetSNP 
PTGS2_cds . 1 
401 3174 
wetSNP 
PTGS2_cds . 1 
401 3793 
wetSNP 
PTGS2__cds . 1 
401 3829 
wetSNP 
PTGS2_cds . 1 
401 5605 
wetSNP 
PTGS2_cds . 1 
401 5676 
wetSNP 
PTGS2_cds . 1 
401 5746 
isSNP SNP00076329 
PTGS2_cds.l 402 
401 6249 
wetSNP 
PTGS2_cds . 1 
401 6444 
wetSNP 
PTGS2_cds . 1 
401 6453 
wetSNP 
PTGS2_cds . 1 
401 7581 
wetSNP 
PTGS2_cds . 1 
401 7763 
wetSNP 
PTGS2_cds . 1 
401 7986 
wetSNP 
PTGS2_cds . 1 
401 8167 
isSNP SNP00076330 
PTGS2_cds.l 402 
401 8434 8434 
isSNP SNP0 004^68 



3050 OG 
GB : HUMPTGS2 . v3 050 . 
402 Silent 
309 0 A>G 
GB : HUMPTGS2 . v3 09 0 . 
402 Intron 
3174 OG 
GB : HUMPTGS 2 . v3 1 7 4 . 
402 Intron 
3793 A>G 
GB : HUMPTGS 2 . v3 7 9 3 . 
402 Silent 
3829 A>G 
GB: HUMPTGS 2 .v382 9 . 
402 Silent 
5605 A>G 
GB: HUMPTGS 2 . v5605 . 
402 Intron 
5681 TATTTT>TT 
GB: HUMPTGS 2 .v5676 . 
402 Intron 
5746 G>T 



G>G 
102-102 

C>T 



G>C 



OT 
132-132 

T>C 
144-144 

G>A 



TATTTT>TT 



V 



261-261 



Stop 
6249 A>G 
GB : HUMPTGS 2 .v6249 
402 Silent 
6444 A>G 
GB : HUMPTGS 2 .v6444 
402 Silent 
6453 A>G 
GB: HUMPTGS 2 . v6453 
402 Silent 
7581 A>G 
GB : HUMPTGS 2 .v7581 
402 Intron 
7763 A>G 
GB : HUMPTGS 2 . v7 7 6 3 
402 Missense 
7986 G>T 
GB : HUMPTGS 2 . v7 9 8 6 
402 Silent 
8167 A>G 

3 ' 
A>G 



. G>A 

335-335 

. G>A 

400-400 

.T>C 

403-403 

.T>C 



,T>C 

511-511 

.OA 

585-585 



V 



H 



V>A 
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TABLE 1 (Cont.) 

consequence PTGS2_cds . 1 402 3' 

Allele GB : HUMPTGS2 401 8473 8473 A>G 

source isSNP SNP00012871 

consequence PTGS2„cds.l 402 3' 

Allele GB : HUMPTGS2 401 9102 9102 A>G 

source isSNP SNP00046169 

consequence PTGS2_cds . 1 402 3' 
GIF PTGS2-genomic-fwd.gif 



PTHLH 

Full name : PTHLH 

Link : PTHLH_link_genomic 

Subsequence PTHLH_cds . 1 106964 117899 #404 

Subsequence GB : AC008011__6 1 183178 #405 

Subsequence PTHLH„mrna__build . 1 106942 118367 #406 

CDS PTHLH_cds.l 534 bp 3 exons #404 

exon 106964 107064 

112688 113110 . 

117890 117899 

PTHLH_mrna_build . 1 1024 bp 3 exons #406 

106942 107064 
112688 113110 
117890 118367 

GB:AC008011_6 405 113450 113450 A>G 

source isSNP SNP00043978 

consequence PTHLH_cds . 1 404 * Intron ! 

GB:AC008011_6 405 115075 115075 A>G 

source dbSNP gnl | dbSNP | ssl455356_allele 

consequence PTHLH__cds . 1 404 Intron 

GB:AC008011_6 405 115160 115160 A>G 

source dbSNP gnl I dbSNP I ssl067559_allele 



exon 
exon 

xnRNA 

exon 
exon 
exon 
Allele 



Allele 



Allele 



consequence PTHLH_cds . 1 404 
GIF PTHLH-genomic-fwd.gif 



Intron 



PTHR1 

Full name : PTHRl 
Link : PTHRl_link_cdna 

Subsequence GB : HUMPTHR 1 1948 #407 

CDS GB : HUMPTHR. 1 1782 bp #408 

ORF 29 1810 
Allele GB : HUMPTHR 407 1417 1417 A>G 

source isSNP SNP00007059 

consequence GB : HUMPTHR . 1 408 
GIF PTHRl -cdna- f wd . gi f 
Link : PTHRl_link_genomic 



Silent 



463-463 



N 



Subsequence 


GB : HSPTHPRH1 


1 


262 


#409 


Subsequence 


GB : HSPTHPRH2 


363 


769 


#410 


Subsequence 


GB : HSPTHPRH3 


870 


1168 


#411 


Subsequence 


GB : HSPTHPRH4 


1269 


2146 


#412 


Subsequence 


GB : HSPTHPRH5 


2247 


3249 


#413 


Subsequence 


GB : HSPTHPRH6 


3350 


4062 


#414 






245 
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PCT/US02/41225 



TABLE 1 (Cont.) 



Sub s equ en c e 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS PTHRl_cds 

exon 107 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

mRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



GB : HSPTHPRH7 




4163 


4475 


#415 


GB : HSPTHPRH8 




4576 


4995 


#416 


GB : HSPTHPRH9 




5096 


5696 


#417 


PTHRl_cds . 1 


107 


5558 


#418 




PTHRl_mrna_build . 1 




79 


5696 


1782 bp 


14 exons 


#418 





#419 



Allele 



GIF PTHRl - 



456 
936 
1436 
1655 
1959 
2351 
2980 
3547 
3938 
4273 
4628 
4851 
5172 



181 

558 

1070 

1546 

1773 

2053 

2546 

3133 

3607 

4004 

4367 

4769 

4892 

5558 



PTHRl_mrna_build . 1 



1948 bp 



14 exons 



#419 



181 
558 
1070 
1546 
1773 
2053 
2546 
3133 
3607 
4004 
4367 
4769 
4892 
5696 

GB : HSPTHPRH3 411 
source wetSNP 
consequence PTHRl_cds . 1 
GB : HSPTHPRH8 416 
source wetSNP 
consequence PTHRl_cds . 1 
genomic-fwd.gif 



79 
456 
936 
1436 
1655 
1959 
2351 
2980 
3547 
3938 
4273 
4628 
4851 
5172 



104 104 A>G 

GB : HSPTHPRH3 . vl04 . G>A 

418 Silent 72-72 

311 311 A>G 

GB : HSPTHPRH8 . v3 11 . T>C 



418 



Silent 



463-463 



N 



RARA 

Full name : retinoic acid receptor, alpha 

Link : RARA_link_cdna 

Subsequence GB:NM_000964 1 

CDS GB:NM_000964.1 1389 bp 



ORF 
Allele 



Allele 



103 1491 

GB:NM_000964 420 2327 

source isSNP SNP00016145 

consequence GB : NM_0 009 64 . 1 

GB : NM 000964 420 2439 

~~ 246 



2907 
#421 



421 
2439 



#420 



2327 A>G 



3' 
A>G 



WO 03/054166 
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TABLE 1 (Cont.) 

source isSNP SNP00049381 

consequence GB : NM_0 009 64.1 421 3' 

GIF RARA-cdna-f wd . gif 



RIN1 

Full name : ras inhibitor 
Link : RINl_JLink_cdna 

Subsequence GB : HUMRASINF 

Allele GB : HUMRASINF 

source isSNP 
Allele GB : HUMRASINF 

source isSNP 
Allele GB : HUMRAS INF 

source isSNP 
Allele GB: HUMRASINF 



source 



1 

422 260 
SNP00123606 
422 424 
SNP00123607 
422 722 
SNP00033587 
422 921 
isSNP SNP00007808 



1285 
260 

424 

722 

921 



#422 
A>G 

A>G 

A>G 

A>G 



ROR2 

Full name : receptor tyrosine kinase-like orphan 

Link : ROR2_link_cdna 

Subsequence GB:NM_004560 1 4092 

CDS GB:NM_004560.1 2832 bp #424 



200 3031 

GB:NM„004560 423 932 

source isSNP SNP00098926 

consequence GB:NM_004560 . 1 
GB:NM_004560 423 1460 

source isSNP SNP00098927 

consequence GB : NM_0 04560 . 1 
GB:NM_004560 423 1973 

source isSNP SNP00098928 

consequence GB : NM_0 04560.1 
GB:NM_004560 423 2287 

source isSNP SNP00028168 

consequence GB :NM_0 04560 . 1 
GB:NM_004560 423 2353 

source isSNP SNP00098929 

consequence GB : NM_0 04560.1 
GB:NM_004560 423 2654 

source isSNP SNP00028169 

consequence GB : NM_0 04 5 60.1 
GB:NM_004560 423 3743 

source isSNP SNP00028170 

consequence GB : NM_0 04560.1 
GB:NM_004560 423 3872 

source isSNP SNP00074568 

consequence GB : NM_0 0456 0. 1 
GB:NM_004560 423 3919 

source isSNP SNP00074569 

consequence GB : NM_0 0456 0.1 
GIF ROR2-cdna-fwd.gif 

247 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



932 

424 
1460 

424 
1973 

424 
2287 

424 
2353 

424 
2654 

424 
3743 

424 
3872 

424 
3919 

424 



receptor 2 
#423 

A>G 

Missense 
A>G 

Missense 
A>G 

Missense 
A>G 

Silent 
A>G 

Silent 
A>G 

Missense 
A>G 

3' 
G>T 

3' 
G>T 



245-245 



421-421 



592-592 



696-696 



718-718 



819-819 



A>T 



L>F 



F>L 



V>I 
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TABLE 1 (Cont.) 
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RORA 

Full name : RAR-related orphan receptor alpha 
Link : RORA_JLink_genoraic 



19553 
16417 
10425 
9288 
8488 
6690 
5625 
3240 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
mRNA 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 
CDS RORA_cds.l 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 
CDS RORA_cds . 2 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

mRNA 

exon 
exon 



9454 
21185 



RORA_cds.l 64220 3076 #425 

RORA_cds . 2 64220 3076 #426 

RORA_cds.4 64220 3076 #427 

GB:AC012344_4_000018 1 

GB:AC012344_4_000020 9555 

GB:AC012344_4_000021 21286 34347 

GB:AC012344_4_000019 34448 43824 

GB:AC012344_4_000023 43925 65900 

RORA_mrna_build.l 64309 2885 #433 

RORA_mrna_build . 4 64290 2885 #434 

RORA_mrna_build.4 1908 bp 11 exons 
64290 64084 



#428 
#429 
#430 
#431 
#432 



#434 



51847 51714 
25290 25205 



19412 

16022 

10304 

9156 

8381 

6580 

5513 

2885 

1671 bp 



64220 64084 

43229 43148 

41851 41776 

25290 25205 

19553 19412 

16417 16022 

10425 10304 

9288 9156 



8488 
6690 
5625 
3240 



8381 
6580 
5513 
3076 
1275 bp 



64220 64084 

43229 43148 

41851 41776 

25290 25205 

19553 19412 

10425 10304 



9288 
8488 
6690 
5625 
3240 



12 exons 



#425 



11 exons 



#426 



9156 
8381 
6580 
5513 
3076 

RORA_mrna_build . 1 1951 bp 
64309 64084 
43229 43148 



12 exons 



#433 



248 
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TABLE 1 (Cont.) 



exon 


41851 


41776 


exon 


25290 


25205 


exon 


19553 


19412 


exon 


16417 


16022 


exon 


10425 


10304 


exon 


9288 


9156 


exon 


8488 


8381 


exon 


6690 


6580 


exon 


5625 


5513 


exon 


3240 


2885 


CDS RORA_ 


cds • 4 


1647 bp 


exon 


64220 


64084 


exon 


51847 


51714 


exon 


25290 


25205 


exon 


19553 


19412 


exon 


16417 


16022 


exon 


10425 


10304 


exon 


9288 


9156 


exon 


8488 


8381 


exon 


6690 


6580 


exon 


5625 


5513 


exon 


3240 


3076 



11 exons 



#427 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:AC012344_4_000020 429 11153 11153 A>G 

source dbSNP gnl | dbSNP | ss380580__allele 

consequence RORA_cds - 1 425 Intron 
consequence RORA_cds.2 426 Intron 
consequence RORA_cds.4 427 Intron 
GB:AC012344_4_000020 429 11182 11182 A>G 

source dbSNP gnl | dbSNP | ss380580_allele 

consequence RORA_cds . 1 425 Intron 
consequence RORA_cds . 2 426 Intron 
consequence RORA_cds . 4 427 Intron 
GB:AC012344_4_000020 429 11183 11183 A>T 

source dbSNP gnl | dbSNP | ss507731_allele 

consequence RORA_cds . 1 425 Intron 
consequence RORA_cds . 2 426 Intron 
consequence RORA_cds . 4 427 Intron 
GB:AC012344_4_000020 429 11254 11254 A>G 

source dbSNP gnl | dbSNP | ss3 8058 0_allele 



consequence RORA_cds . 1 42 5 

consequence RORA_cds . 2 42 6 

consequence RORA_cds . 4 427 

GB:AC012344_4_000020 429 
source 

consequence RORA_cds . 1 425 

consequence RORA_cds . 2 426 

consequence RORA_cds.4 427 

GB:AC012344_4_000020 429 11264 11264 A>G 
source dbSNP gnl | dbSNP | ss3 80580_allele 

consequence RORA_cds . 1 425 Intron 

consequence RORA_cds . 2 426 Intron 

consequence RORA_cds . 4 427 Intron 

GB:AC012344_4_000020 429 11265 11265 A>T 
source dbSNP gnl | dbSNP | ss507731_allele 

consequence RORA_cds . 1 435 Intron 



Intron 
Intron 
Intron 

11255 11255 A>T 
dbSNP gnl|dbSNP|ss507731_allele 

Intron 
Intron 
Intron 
11264 11264 
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TABLE 1 (Cont.) 

consequence RORA_cds . 2 426 Intron 

consequence RORA__cds . 4 427 Intron 

Allele GB:AC012344_4_000020 429 11320 11320 A>G 
source dbSNP gnl | dbSNP | ss380580_allele 

consequence RORA__cds . 1 425 Intron 

consequence RORA_cds . 2 426 Intron 

consequence RORA_cds . 4 427 Intron 

GIF RORA-genomic-rev.gif 



SCRG1 

Full name : scrapie responsive protein 
Link : SCRGl_link_genomic 



Subsequence 
Subsequence 
Subsequence 
CDS SCRGl_cds . 1 

exon 

exon 

mRNA 

exon 
exon 



SCRGl__cds.l 30577 33650 
GB : AC0 0 9 58 8__4 1 
SCRGl_mrna_bui Id . 1 



297 bp 
30577 30818 
33596 33650 
S CRGl_mrna_bu i 1 d . 
30561 30818 
33596 33845 



2 exons 



#435 

164772 #436 
30561 33845 #437 
#435 



508 bp 



2 exons 



#437 



GIF SCRGl-genomic-fwd.gif 



SCYA2 0 

Full name : small inducible cytokine subfamily A member 20 
Link : SCYA2 0__1 ink_cdna 

Subsequence GB:HSU64197 1 821 



CDS GB:HSU64197 .1 



288 bp 



#438 
#439 



ORF 
Allele 



Allele 



438 341 341 
isSNP SNP00037526 
GB:HSU64197.1 
438 728 728 
isSNP SNP00037527 
GB:HSU64197 .1 



43 330 
GB:HSU64197 
source 
consequence 
GB:HSU64197 
source 
consequence 
GIF SCYA2 0 -cdna- f wd . gi f 
Link : SCYA2 0_link_genomic 

Subsequence SCYA20_cds . 1 

Subsequence GB : AC02756 0_2 

Subsequence S C Y A 2 0 _mrn a__bu i 1 d 

CDS SCYA2 0_cds.l 288 bp 

exon 73925 74000 
exon 75470 75581 
exon 76320 76397 
exon 77075 77096 
mRNA SCYA20__mrna_build. 1 

exon 73883 74000 
exon 75470 75581 
exon 76320 76397 
exon 77075 77577 
Allele GB:AC027560_2 441 



A>G 

439 
A>G 

439 



73925 77096 #440 
1 129588 
1 73883 77577 

4 exons #440 



#441 
#442 



811 bp 



4 exons 



#442 



77107 
250 



77107 A>G 
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TABLE 1 (Cont.) 

source isSNP SNP00037526 

consequence SCYA20_cds.l 440 3' 

Allele GB:AC027560_2 441 77493 77493 A>G 

source isSNP SNP00037527 

consequence SCYA20_cds.l 440 3' 

GIF SCYA20- genomic- fwd . gi f 



SDC2 

Full name : syndecan 2 
Link : SDC2_link_cdna 

Subsequence GB : HUMHSPGC 



CDS GB : HUMHSPGC . 2 

ORF 1 1194 

Allele GB: HUMHSPGC 

source 
consequence 

Allele GB: HUMHSPGC 

source 
consequence 

Allele GB: HUMHSPGC 

source 
consequence 

Allele GB: HUMHSPGC 

source 
consequence 

GIF SDC2 - cdna- fwd . gi f 



1194 bp 



435 



3414 



435 



443 

isSNP SNP00116695 
GB: HUMHSPGC. 2 
443 463 463 
isSNP SNP00050825 
GB : HUMHSPGC . 2 
443 741 741 
isSNP SNP00033651 
GB : HUMHSPGC . 2 
443 1041 1041 
isSNP SNP00099428 
GB: HUMHSPGC. 2 



#443 
#444 

A>G 

444 
OG 

444 
A>G 

444 
G>T 

444 



Silent 



Missense 



Silent 



Silent 



145-145 



155-155 



247-247 



347-347 



L>V 



SDC4 

Full name : syndecan 4 

Link : FL_1394592_link_cdna 

Subsequence FN: 1394592CB1 3 

CDS FN:1394592CB1.1 594 bp 

ORF 23 616 
CDS GB:HS453C12_1394592CD1 594 bp 



2112 
#446 



#445 



#272 



ORF 
ORF 
ORF 
ORF 
ORF 

mRNA 

ORF 
ORF 
ORF 
ORF 
ORF 
Allele 



Allele 



87967 88026 

100431 

103282 

105787 

108936 



100569 
103328 
105985 
109084 



2110 bp 



FL__1 39459 2_mrna_buil d . 1 
87945 88026 
100431 100569 
103282 103328 
105787 105985 
108936 110578 

FN:1394592CB1 445 653 653 

source isSNP SNP00124074 

consequence FN: 139459 2CB1 . 1 446 

FN:1394592CB1 445 749 749 

source isSNP SNP00124075 

consequence FN: 1394592CB1 . 1 446 

251 



#274 



OG 

3 ' 
A>G 
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Allele FN:1394592CB1 445 856 856 A>G 

source isSNP SNP00053065 

consequence FN: 1394592CB1 . 1 446 3' 

FN:1394592CB1 445 884 884 A>G 

source isSNP SNP00066145 

consequence FN: 1394592CB1 . 1 446 3' 

FN:1394592CB1 445 1048 1048 A>G 

source isSNP SNP00066146 

consequence FN: 1394592CB1 . 1 446 3' 

FN:1394592CB1 445 1214 1214 A>G 

source isSNP SNP00029910 

consequence FN: 1394592CB1 . 1 446 3' 
SDC4 - cdna- fwd . g i f 
FL_1 25070 8_1 ink_genomi c 



Allele 



Allele 



Allele 



GIF 
Link : 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS GB:HS453C12 

exon 

exon 

exon 

exon 
• exon 
mRNA 

exon 
exon 
exon 
exon 
exon 
Allele 



#271 

109084 #272 
10528 #273 
110578 #274 
6152 #275 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:HS453C12 1 147620 
GB:HS453C12_1394592CD1 87967 
GB:HS453C12__2027624CD1 20194 
FL_1394592_mrna_build. 1 87945 
FL_2027624_mrna_build.l 20197 
OA21_cds.l 20194 17050 #276 
1394592CD1 594 bp' 5 exons #272 

87967 88026 
100431 100569 
103282 103328 
105787 105985 
108936 109084 

FL_1394592„mrna_build. 1 2110 bp 5 exons 

87945 88026 

100569 
103328 
105985 
110578 

271 90320 90320 A>G 
isSNP SNP00026142 
GB:HS453C12_1394592CD1 272 
271 90420 90420 OG 
isSNP SNP00026143 
GB:HS453C12_1394592CD1 272 
271 96768 96768 A>G 
dbSNP gnl|dbSNP|ss736312__allele 
GB:HS453C12_1394592CD1 272 
271 109121 109121 
isSNP SNP00124074 
GB:HS453C12_1394592CD1 272 
271 109217 109217 
isSNP SNP00124075 
GB:HS453C12_1394592CD1 272 
271 109324 109324 
isSNP SNP00053065 
GB:HS453C12_1394592CD1 272 
271 109352 109352 
isSNP SNP00066145 
GB:HS453C12_1394592CD1 272 
271 109516 109516 
isSNP SNP00066146 
252 



#274 



100431 
103282 
105787 
108936 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 



Intron 



Intron 



Intron 
OG 

3' 
A>G 

3' 
A>G 

3' 
A>G 

3 ' 
A>G 
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TABLE 1 (Cont.) 

consequence GB :HS453C12_1394592CD1 272 3' 

Allele GB:HS453C12 271 109682 109682 A>G 

source isSNP SNPO 0029 910 

consequence GB :HS453C12_1394592CD1 272 3' 
GIF SDC4-genomic-fwd.gif 



SEDL 

Full name : sedlin 
Link : SEDL__link__cdna 

Subsequence GB : NM_0 1 4 5 6 3_1 1 2816 #447 

CDS GB:NM_014563_JL. 1 423 bp #448 
ORF 230 652 

Allele GB:NMJD14563__1 447 991 991 G>T 

source dbSNP gnl | dbSNP] ss380525_allele 

source dbSNP gnl j dbSNP j ss531221_allele 

consequence GB : NM_0 1456 3__1 . 1 448 3' 

Allele GB:NM„014563__1 447 2026 2026 A>G 

source dbSNP gnl | dbSNP | ss637643_allele 

source dbSNP gnl j dbSNP j ss869682_allele 

source dbSNP gnl j dbSNP j ssl272499_allele 

source dbSNP gnl j dbSNP j ss232503_allele 

source dbSNP gnl j dbSNP j ss459122_allele 

consequence GB : NM__0 1456 3_1 . 1 448 3' 

Allele GB:NM__014563_1 447 2391 2391 OG 

source isSNP SNP00010387 

consequence GB : NM_0 1456 3_1 . 1 448 3 ' 

GIF SEDL-cdna- f wd . gi f 



SKI 

Full name : v-ski avian sarcoma viral oncogene homolog 

Link : SKI_link_cdna 

Subsequence GB:NM_003036 1 3511 #449 

CDS GB:NM_003036 .1 2187 bp #450 



ORF 
Allele 



Allele 



Allele 



73 2259 
GB:NM_00303 6 
source isSNP 
consequence GB : NM_ 
GB:NM_00303 6 
source isSNP 
consequence GB : NM_ 
GB:NM„00303 6 
source isSNP 



449 528 
SNP00068450 
003036.1 
449 1146 
SNP00068451 
.003036.1 
449 3482 
SNP00068452 



consequence 
GIF SKI - cdna- f wd . gi f 



GB:NM_003036.1 



528 

450 
1146 

450 
3482 

450 



A>G 

Silent 
A>G 

Silent 
OG 



152-152 



358-358 



R 



SOD2 

Full name : superoxide dismutase 2, mitochondrial 

Link : SOD2__link_cdna 

Subsequence EM : HSSOD 1 102 6 #451 

253 



WO 03/054166 
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TABLE 1 (Cont.) 

Allele EM : HSSOD 451 243 243 A>G 

source isSNP SNP00021476 

Link : SOD2__link_genomic 

Subsequence EM:S77127 101 12957 #452 

Subsequence SOD2_link_cds . 1 957 11597 #453 

Subsequence SOD2_mrna_build . 1 953 11950 #454 

mRNA SOD2_mrna__build . 1 1026 bp 5 exons #454 



exon 


qc*3 Q7Q 






exon 








exon 


5859 5975 






exon 


qnfil 9940 






exon 


J L f± -J» 11 J JU 






pT)c SOD2 


"1 •? nk cds . 1 


669 bp 5 exons 


#453 


exon 


957 979 






exon 


1260 1462 






exon 


5859 5975 






exon 


9061 9240 






exon 


11452 11597 






Allele 


EM:S77127 


452 1183 1183 A>G 






source 


isSNP SNP00003080 






source 


wetSNP EM:S77127. 


vll83 .C>T 




consequence 


SOD2_link_cds .1 453 


Missense 


Allele 


EM:S77127 


452 1456 1456 G>T 






source 


wetSNP EM:S77127. 


vl456.A>C 




consequence 


SOD2_link_cds .1 453 


Intron 


Allele 


EM:S77127 


452 1734 1734 A>G 






source 


isSNP SNP00107369 






consequence 


SOD2_link_cds.l * 453 


Intron 



GIF SOD2-genomic-fwd.gif 



SOD3 

Full name : superoxide dismutase 3, extracellular 

Link : SOD3_link_cdna 

Subsequence GB:SOD3 1 1984 #455 

CDS GB:SOD3.1 723 bp #456 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



664 1386 

GB : SODS 

source 

consequence 

GB : SOD3 

source 

consequence 

GB : SODS 

source 

consequence 

GB : SODS 

source 

consequence 

GB : SODS 

source 

consequence 

GB : SODS 

source 



455 



835 



835 

isSNP SNP00033027 
GB:SOD3.1 456 
455 874 874 
isSNP SNP00062433 
GB:S0D3.1 456 
455 1469 1469 
isSNP SNP00067750 
GB:SOD3.1 456 
455 1496 1496 
isSNP SNP00007500 
GB:SOD3.1 456 
455 1817 1817 
isSNP SNP00104042 
GB:SOD3.1 456 
455 1826 1826 
isSNP SNP0 003 1110 



A>G 

Missense 
A>G 

Silent 
A>G 



A>G 



58-58 T>A 



71-71 L 



G>T 



A>G 
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TABLE 1 (Cont.) 



consequence GB:SOD3.1 456 3' 

Allele GB:SOD3 455 1932 1932 A>G 

source isSNP SNP00050239 

consequence GB : SOD3 . 1 456 3' 
GIF SOD3-cdna-fwd.gif 
Link : FL_1534327_link_genomic 

Subsequence GB:HSU10116 1 10079 #457 

Subsequence GB : HSU1 Oil 6__1 534327 GDI 5085 

Subsequence FL_1534327_mrna__build . 1 1130 



mRNA FL_1534327_rrurna_build. 1 1427 bp 

exon 1130 1219 

exon 5069 6405 
CDS GB:HSU10116_1534327CD1 723 bp 1 exon 

exon 5085 5807 



5807 #458 
6405 #459 
2 exons #459 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:HSU10116 

source 

consequence 

GB:HSU10116 

source 

consequence 

GB:HSU10116 

source 

consequence 

GB:HSU10116 

source 

consequence 

GB:HSU10116 

source 

consequence 

GB:HSU10116 

source 

consequence 

GB:HSU10116 

source 

consequence 



457 5256 5256 A>G 
isSNP SNP00033027 
GB:HSU10116_1534327CD1 
457 5295 5295 A>G 
isSNP SNP00062433 
GB:HSU10116_1534327CD1 
457 5890 5890 A>G 
isSNP SNP00067750 
GB:HSU10116_1534327CD1 
457 5917 5917 A>G 
isSNP SNP00007500 
GB:HSU10116_1534327CD1 
457 6238 6238 G>T 
isSNP SNP00104042 
GB:HSU10116_1534327CD1 
457 6247 6247 A>G 
isSNP SNP00031110 
GB:HSU10116_1534327CD1 
457 6353 6353 A>G 
isSNP SNP00050239 
GB:HSU10116„1534327CD1 



458 



458 



458 



458 



458 



458 



#458 



458 Missense 



Silent 



58-58 T>A 



71-71 L 



GIF SOD3 -genomi c- f wd . gi f 



SOX9 

Full name : SOX9 

Link : SOX9_l ink_cdna 

Subsequence GB : HSSOX9MRN 

CDS GB:HSSOX9MRN.2 1530 bp 



ORF 
Allele 



Allele 



Allele 



Allele 



360 1889 

GB:HSSOX9MRN 460 866 

source isSNP SNP00092616 

consequence GB :HSSOX9MRN. 2 

GB:HSSOX9MRN 46 0 1571 

source isSNP SNP00108001 

consequence GB : HSSOX9MRN . 2 

GB:HSSOX9MRN 460 1912 

source isSNP SNP00055269 

consequence GB : HSSOX9MRN . 2 

GB:HSSOX9MRN 460 2374 

255 



3923 
#461 

866 



461 - 
1912 



#460 



A>G 



461 Silent 
1571 A>G 



Silent 
G>T 



461 3' 
2374 A>G 



169-169 



404-404 



H 
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TABLE 1 (Cont.) 



source isSNP SNP00041454 

consequence GB : HSSOX9MRN. 2 461 
Allele GB:HSSOX9MRN 460 3224 3224 

source isSNP SNP00061027 

consequence GB : HSSOX9MRN. 2 461 
Allele GB:HSSOX9MRN 460 3470 3470 

source isSNP SNP00055270 

consequence GB : HSSOX9MRN . 2 461 
GIF SOX9-cdna-fwd.gif 
Link : FL_5425567_link_genomic 

Subsequence GB : AC007461_8_5425567CD1 

Subsequence GB : AC007461_8 1 

Subsequence SOX9__mrna_build . 1 64243 



3' 
OG 

3' 
A>G 



63884 

180385 
58856 #464 



60889 
#463 



1530 bp 



3 exons 



#462 



#462 



3 exons 



59309 59309 A>G 



C>G 



A>G 



A>G 



G>T 



A>G 



A>G 



#464 



CDS GB:AC007461_8_5425567CD1 
exon 63884 63454 
62557 62304 
61733 60889 

SOX9_mrna_build. 1 3922 bp 
64243 63454 
62557 62304 
61733 58856 
GB:AC007461_8 463 
source isSNP SNP00055270 

cons equenc e GB : AC 0 0 7 4 6 1„8_5 425567 GDI 
GB:AC007461_8 463 59555 59555 

source isSNP SNP00061027 

consequence GB : AGO 07 4 6 1_8_5425 567 CD1 
GB: AC007461_8 463 60078 60078 

source isSNP SNP00010889 

consequence GB : AC007461_8_5425567CDl 
GB:AC007461_8 °463 60404 60404 

source isSNP SNP00041454 

consequence GB : AC007461_8_5425567CD1 
GB:AC007461„8 463 60866 60866 

source isSNP SNP00055269 

consequence GB : AGO 07 4 6 1_8__5 425 5 67 CDl 
GB: AC007461_8 463 61207 61207 

source isSNP SNP00108001 

consequence GB : AC007461_8__5425567CDl 
P 

GB:AC007461_8 463 62482 62482 

source isSNP SNP00092616 

source wetSNP GB : AC0 0746 1_8 . v6 2482 . G>A 

consequence GB : AC00746 1_8_542 5567CD1 462 Silent 

169-169 H 

GIF SOX9-genomic-rev.gif 



exon 
exon 

raRNA 

exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



404-404 
Allele 



462 



462 



462 



462 



462 



462 



Silent 



STATI2 

Full name : STAT-induced STAT inhibitor-2 

Link : FL_2787140_link_cdna 

Subsequence FN: 2787140CB1 1 2587 #465 

CDS FN:2787140CB1.1 927 bp #466 
ORF 9 8 1024 



WO 03/054166 
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TABLE 1 (Cont.) 



Allele FN:2787140CB1 465 1325 1325 A>G 

source isSNP SNP00041483 

consequence FN: 2787140CB1 . 1 466 

Allele FN:2787140CB1 465 1442 1442 

source isSNP SNP00106962 

consequence FN: 2787140CB1 . 1 466 

Allele FN:2787140CB1 465 1470 1470 

source isSNP SNP00041484 

consequence FN: 2787140CB1 . 1 466 

Allele FN:2787140CB1 465 1974 1974 

source isSNP SNP00106963 

consequence FN: 2787140CB1 . 1 466 

GIF STATI2-cdna-fwd.gif 
Link : FL_1405668_link_genomic 

Subsequence GB : AC 01208 5__1 1 

Subsequence FL_2787140__mrna_build . 1 



3 ' 
G>T 

3 ' 
A>G 

3 ' 
A>G 



rnRNA 

exon 
exon 
exon 
Allele 

Allele 

Allele 



FL_2 7 8 7 1 4 0_mrna__bu i 1 d . 1 
42013 42225 
43694 44045 
45731 47745 



177866 #467 
42013 47745 #468 
3 exons 



GB:AC012085_1 



467 



2580 bp 



44268 44268 A>G 



#468 



source 
GB:AC012085_1 



isSNP SNP00070304 



467 



46492 46492 A>G 



source 



isSNP SNP00041483 



GB:AC012085_1 467 46609 

source isSNP SNP001069 62 

Allele GB:AC012085_1 467 46637 

source isSNP SNP00041484 

Allele GB:AC012085_1 467 47141 

source isSNP SNP00106963 

GIF STATI2-genomic-fwd.gif 



46609 G>T 



46637 A>G 



47141 A>G 



THBS1 

Full name : thrombospondin 1 
Link : THBSl_link_cdna 

Subsequence GB : HSTS 

CDS GB:HSTS.l 3513 bp 



5722 
#470 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



112 3624 

GB : HSTS 

source 

consequence 

GB : HSTS 

source 

consequence 

GB : HSTS 

source 

consequence 

GB : HSTS 

source 

consequence 

GB : HSTS 

source 

consequence 



469 1239 1239 

isSNP SNP00046537 

GB:HSTS.l 470 

469 2210 2210 

isSNP SNP00046539 

GB:HSTS.l 470 

469 2979 2979 

isSNP SNP00061983 

GB:HSTS.l 470 

469 3680 3680 

isSNP SNP00108514 

GB:HSTS.l 470 

469 3703 3703 

isSNP SNP00013197 

GB:HSTS.l 470 
257 



#469 



A>G 

Silent 
A>G 

Missense 
A>G 

Silent 
G>T 

3 ' 
A>G 



376-376 



700-700 



956-956 



D 



N>S 
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TABLE 1 (Cont.) 

Allele GB : HSTS 469 3905 3905 A>G 

source isSNP SNP00093327 

consequence GB:HSTS.l 470 3' 

Allele GB : HSTS 469 5259 5259 A>G 

source isSNP SNP00105437 

consequence GB:HSTS.l 470 3' 

GIF THBSl-cdna-fwd.gif 



TIMP1 

Full name : Tissue Inhibitor of Metalloproteinase 1 
Link : TIMPl_link_cdna 

Subsequence FN:411388CB1 1 853 #471 

CDS FN:411388CB1.1 621 bp #472 

ORF 122 742 
Allele FN:411388CB1 471 365 365 C>G 

source isSNP SNP00115174 

consequence FN: 411388CB1 . 1 472 Missense 

GIF TIMPl-cdna-fwd.gif 
Link : FL — 3013907_link_genomic 

Subsequence GB:HS230G1 1 125515 #473 

Subsequence GB : HS230G1_411388CD1 20559 17287 #474 

Subsequence TIMPl_mrna_build . 1 21613 17186 #475 



82-82 R>G 



mRNA TIMPl_mrna_build.l 843 bp 

exon 21613 21501 

exon 20567 20439 

exon 19039 18960 

exon 18770 18644 

exon 18432 18308 

exon 17454 17186 
CDS GB:HS230G1_411388CD1 621 bp 

exon 20559 20439 

exon 19039 18960 

exon 18770 18644 

exon 18432 18308 

exon 17454 17287 

Allele GB:HS230G1 473 

source wetSNP 



6 exons 



#475 



5 exons 



#474 



17434 17434 A>G 

GB:HS230Gl.vl7434 .G>A 



consequence GB:HS23 0Gl_4113 88CDl 



474 



Silent 



158-158 



I 

Allele 



Allele 



Allele 



Allele 



F 

Allele 



GB:HS23 0G1 

source 

consequence 

GB:HS23 0G1 

source 

consequence 

GB:HS230Gl 

source 

consequence 

GB:HS23 0G1 

source 

consequence 



473 17550 17550 A>G 
isSNP SNP00099224 

GB:HS23 0G1_4113 88CD1 474 Intron 

473 18046 18046 A>G 
isSNP SNP00099223 

GB:HS230G1_411388CD1 474 Intron 

473 18088 18088 A>G 
isSNP SNP00030937 

GB:HS230G1_4113 88CD1 474 Intron 

473 18389 18389 A>G 
wetSNP GB :HS23 0G1 . vl83 89 . A>G 

GB:HS230G1_4113 88CD1 474 Silent 



124-124 



GB:HS230G1 473 



18495 1^495 OG 
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source 
source 
consequence 

Allele GB:HS230G1 
source 
consequence 

Allele GB:HS230G1 
source 
consequence 

GIF TIMP1 -genomic- rev 



isSNP SNP00099222 

wetSNP GB : HS2 3 0G1 . vl 849 5 . OG 

GB:HS230Gl_411388CDl 474 Intron 

473 18711 18711 A>G 

we t SNP GB : HS2 3 0G1 . vl 8 7 1 1 . G>A 

GB:HS23 0G1_4113 88CD1 474 Silent 

473 18728 18728 OG 

isSNP SNP00115174 

GB : HS23 0Gl_4113 88CD1 474 Missense 

gif 



87-87 P 



82-82 R>G 



TIMP2_cds . 1 


822 


3126 


#476 


GB:S68860_1 


1 


970 


#477 


GB:U44382__1 


1071 


1320 


#478 


GB:U44383__1 


1421 


1644 


#479 


GB:U44384„1 


1745 


2283 


#480 


GB:U443 85_1 


2384 


3750 


#481 


TIMP2_mrna_build . 1 




810 


663 bp 


5 exons 


#476 



TIMP2 

Full name : Tissue Inhibitor of Metalloproteinase-2 . 
Link : TIMP2„link_genomic 

Subsequence 

Subsequence 

Subsequence 

Subsequence 

Subsequence 

Subsequence 

Subsequence TIMP2 irtrna build. 1 810 3 2 51 #4 82 

CDS TIMP2__cds 
exon 822 
exon 
exon 
exon 
exon 

raRNA TIMP2_mrna_build . 1 800 bp 5 exons 

exon 
exon 
exon 
exon 
exon 
Allele 



1125 

1504 

1939 

2929 

TIMP2. 

810 

1125 

1504 

1939 

2929 



951 
1225 
1612 
2063 
3126 
_mrna_build . 1 
951 
1225 
1612 
2063 
3251 



#482 



GB:U443 83_1 
source 
consequence 



479 155 
wetSNP 
TIMP2_cds . 1 



155 A>G 
GB :U443 83_1 . vl55 . G>A 



476 



Silent 



101-101 



GIF TIMP2 -genomi c - f wd . gi f 



TNA 

Full name : tetranectin 

Link : TNA_1 ink_cdna 

Subsequence GB : NM_0 03 2 7 8 

CDS GB:NM„003278.1 609 bp 



94 702 

GB:NM_003278 483 409 

source isSNP SNP00007942 

consequence GB : NM_0 03 278. 1 
GB:NM„003278 483 744 

source isSNP SNP00007943 

consequence GB : NM_0 03 278 . 1 
GIF TNA-cdna-fwd.gif 2 59 



ORF 
Allele 



Allele 



874 
#484 

409 

484 
744 

484 



#483 



A>G 



Missense 
A>G 



106-106 



S>G 
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Link : TNA_link__genomic 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS TNA_cds.l 
exon 254 
exon 829 
exon 1229 
CDS TNA_cds.2 
exon 254 
exon 

mRNA 

exon 
exon 
exon 
Allele 



TNA_cds . 1 
TNA_cds . 2 
GB:X70910_ 
GB:X70911_ 
GB:X70912_ 



254 
254 
1 

671 
1079 



TNA_mrna_build . 1 



1629 

1629 

570 

978 

1805 

164 



bp 



1229 



Allele 



609 
362 
927 
1629 
510 bp 
362 
1629 
TNA_mrna_bu ild 
164 362 
829 927 
1229 1776 
GB:X70912_1 
source 
consequence 
consequence 
GB:X70912_1 
source 
consequence 
consequence 



3 exons 



2 exons 



846 bp 



489 258 258 
isSNP SNP00007942 
TNA_cds.l 485 
TNA_cds.2 486 
489 593 593 
isSNP SNP00007943 
TNA_cds.l 485 
TNA_cds.2 486 



#490 



#485 
#486 
#487 
#488 
#489 
1776 
#485 



#486 



3 exons 



A>G 

Missense 
Missense 
A>G 



#490 



106-106 
73-73 S>G 



S>G 



GIF TNA-genomic-fwd.gif 



TNFAIP6 

Full name : tumor necrosis factor, alpha-induced protein 6 

Link : TNFAIP6_link_cdna 

Subsequence GB :NM_007115_1 1 1414 #491 

CDS GB:NM_007115_1 . 1 834 bp #492 



ORF 
Allele 



Allele 



69 902 

GB:TSIM_007115_1 491 499 499 

source isSNP SNP00040822 

consequence GB : NM_0 0711 5_1 . 1 492 
GB:NM_007115_1 491 1143 1143 

source isSNP SNP00040823 

consequence GB :NM_007115_1 . 1 492 : 
GIF TNFAIP6-cdna-fwd.gif 
Link : FL„1 0009 0 9_1 ink_genomi c 

Subsequence GB : AC009311__1_191918CD1 132384 

Subsequence GB : AC009311_1 1 160198 

Subsequence TNFAIP6_mrna_build. 1 132314 



A>G 

Missense 
OG 



144-144 



R>Q 



mRNA 





TNFAIP6. 


_mrna_bu i 1 d . 1 


exon 


132314 


132477 


exon 


138660 


138797 


exon 


140773 


140934 


exon 


144737 


144965 


exon 


148266 


148306 


exon 


154081 


154760 



1414 bp 



154250 
#494 
154760 
exons 



#493 



#495 



#495 



CDS GB:AC009311_1_191918CD1 834 bp 



260 



exons 



#493 
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exon 
exon 
exon 
exon 
exon 
exon 
Allele 



A>T 
Allele 



Allele 



Q>R 
Allele 



Allele 



Allele 



Allele 



132384 132477 

138660 138797 

140773 140934 

144737 144965 

148266 148306 

154081 154250 

GB:AC009311_1 494 140934 

source wetSNP GB: AGO 093 11, 

consequence GB : ACQ 0 9 3 1 1_1_1 91918 GDI 



GB:AC009311_1 494 140942 

source wetSNP GB : AC009311. 

consequence GB: AC 00931 1_1_1 9 1 9 1 8 CD1 
GB:AC009311_JL 494 144773 

source isSNP SNP00040822 

source wetSNP GB:AC009311 

consequence GB : AC0 0 9 3 11_1_1 9 1 9 1 8CD1 

GB:AC009311_JL 494 148030 

source dbSNP gnl | dbSNP | ss645109 

consequence GB : AC009311_1_191918CD1 
GB:AC009311_1 494 148229 

source wetSNP GB : AC009311_ 

consequence GB : AC0 0931 1_1_1 919 1 8CD1 
GB:AC009311_1 494 148245 

source wetSNP GB : AC 0 0 9 3 1 1_ 

consequence GB : AGO 0 9 3 11_1_19 19 18CD1 
GB:AC009311„1 494 154493 

source isSNP SNP00040823 

consequence GB : AC009311__1_1919 18CD1 



140934 A>G 
JL.vl40934.G>A 
493 Missense 132-132 

140942 A>T 
_1 .V140942 . A>T 
493 Intron 
144773 A>G 

J_.vl44773 . A>G 
493 Missense 144-144 



148030 A>G 

_allele 
493 Intron 
148229 A>G 

l.vl48229 .T>C 
493 Intron 
148245 A>G 

l.vl48245.T>C 
493 Intron 
154493 C>G 

493 3' 



GIF TNFAIP6-genomic-fwd.gif 



TNFRSF11B 

Full name : TNFRSF11B 
Link : TNFRSFllB_JLink_cdna 

Subsequence GB:AB002146 1 1206 #496 

CDS GB:AB002146.1 1206 bp #497 

ORF 1 1206 
Allele GB:AB002146 496 768 768 A>G 

source isSNP SNP00028816 

consequence GB : ABO 0 2146.1 497 
GIF TNFRSFllB-cdna-fwd.gif 
Link : TNFRSFllB_link_genomic 

Subsequence TNFRSFllB_cds . 1 125 9057 

Subsequence GB:E15270__1 1 9898 #499 



CDS TNFRSFllB_cds.l 
exon 13 0 49 9 



1176 bp 



Silent 



#498 



256-256 



#498 



exon 
exon 
exon 
Allele 



4504 4695 
6716 6940 
8669 9057 
GB:E15270_1 
source 
consequence 



499 503 
wetSNP 
TNFRSFllB^cds^ 



503 A>G 

GB:El5270_l.v503 . OT 



498 



Intron 
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Allele 



Allele 



Allele 



Allele 



GB:E15270_1 

source 

consequence 

GB:E15270_1 

source 

consequence 

GB:E15270_1 

source 

consequence 

GB:E15270_1 



source 
consequence 

Allele GB:E15270_1 
source 
consequence 

Allele GB:E15270_1 
source 
consequence 

GIF TNFRSF 1 IB -genomic - 



499 4499 4499 A>G 
we t SNP GB : E 1 5 2 7 0_1 . v4 4 9 9 . C>T 

TNFRSFllB_cds.l 498 Intron 
499 4661 4661 A>G 
wetSNP GB:E15270_1 .v4661 . C>T 

TNFRSFllB_cds.l 498 Silent 
499 4749 4752 TCTG>TG 
wetSNP GB:E15270_1 .v4749 . TCTG>TG 

TNFRSFllB_cds .1 498 
499 6599 6599 A>G 
wet SNP GB : El 5 2 7 0_1 

TNFRSFllB_cds .1 49 8 
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What is claimed is: 

1. A method of determining susceptibility of an individual to joint space narrowing and/or 
osteophyte development and/or joint pain comprising identifying whether the individual has at least one 

5 polymorphism in a polynucleotide encoding at least one of the proteins listed in Table 1. 

2. The method of claim 1, wherein said proteins listed in Table 1 are selected from the group 
consisting of bone morphogenic protein 2 (BMP2), cartilage intermediate layer protein (CILP), 
cartilage oligomeric matrix protein (COMP), tissue inhibitor of metalloproteinase 1 (TIMP1), 

10 tetranectin (TNA), matrix metalloproteinase 3 (MMP3), and prostaglandin-endoperoxide synthase 2 
(PTGS2). 

3. The method of claim 1, wherein the joint space narrowing and/or osteophyte development 
and/or joint pain is associated with a disease. 

15 

4. The method of claim 3 wherein the disease is osteoarthritis. 

5. The method of claim 1 where at least one of the polymorphisms is selected from the 
polymorphisms listed in Table 1. > 

20 

6. The method of claim 1 comprising contacting a sample from the individual with a specific 
binding agent for the polymorphism and determining whether the agent binds to the polymorphism. 

7. The method of claim 1 where the polymorphism in the polynucleotide is determined for 
25 more than one allele of the individual. 

8. A method for modulating the susceptibility of an individual to joint space narrowing and/or 
osteophyte development and/or joint pain, comprising identifying the individual by the method of claim 
1 and administering to the individual a composition comprising an effective amount of an agent which 

30 modulates said susceptibility. 

9. The method of claim 8, wherein the joint space narrowing and/or osteophyte development 
and/or joint pain is associated with a disease. 
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10. The method of claim 9 wherein the disease is osteoarthritis. 

11. A polynucleotide encoding a protein listed in Table 1 having at least one polymorphism in 
the polynucleotide selected from the group of polymorphisms listed in Table 1 for the polynucleotide. 

12. A fragment of a polynucleotide encoding a protein selected from Table 1 having at least 
one polymorphism in the fragment selected from the group of polymorphisms listed in Table 1. 

13. A fragment of claim 12 having a length of 8 to 100 nucleotides. 

14. A fragment of claim 12 having a length of 8 to 30 nucleotides. 

15. A fragment of claim 12 having a length of 9 to 15 nucleotides. 

16. A method of identifying an agent for modulating susceptibility of an individual to joint 
space narrowing and/or osteophyte development and/or joint pain comprising: 

a) contacting a test agent with a polypeptide or a polynucleotide encoding the polypeptide 
selected from the list of Table 1 having at least one of the polymorphisms selected from the list of 
Table 1, 

b) determining whether the agent is capable of binding to the polypeptide or polynucleotide 
encoding the polypeptide, and 

c) determining whether the activity or expression of the polypeptide or polynucleotide 
encoding the polypeptide is modulated. 

17. A method of formulating a composition comprising 

a) identifying an agent for modulating the susceptibility of an individual to joint space 
narrowing and/or osteophyte development and/or joint pain by the method of claim 16, and 

b) formulating the agent with a carrier or diluent. 

18. An agent identified by the method of claim 16. 
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19. A composition for modulating the susceptibility of an individual to joint space narrowing 
and/or osteophyte development and/or joint pain comprising an agent according to claim 18 and a 
carrier. 

20. A method comprising using an agent of claim 1 8 in the manufacture of a medicament for 
modulating susceptibility to joint space narrowing and/or osteophyte development and/or joint pain. 

21. A probe, primer or antibody which is capable of selectively detecting a polymorphism 
listed in Table 1 which is associated with susceptibility to joint space narrowing and/or osteophyte 
development and/or joint pain. 

22. A vector comprising the polynucleotide of claim 11. 

23. A host cell line comprising the vector of claim 22. 

24. A nonhuman animal which is transgenic for the polynucleotide of claim 11. 

25. A cell line comprising the polynucleotide of claim 11. 

26. A method of using a cell line of claim 25 to screen for an agent for diagnosis of an 
individual having susceptibility to joint space narrowing and/or osteophyte development and/or joint 
pain. 

27. A method of using a nonhuman animal of claim 24 to screen for an agent for diagnosis of 
an individual having susceptibility to joint space narrowing and/or osteophyte development and/or joint 
pain. 

28. A kit for diagnosis of an individual having susceptibility to joint space narrowing and/or 
osteophyte development and/or joint pain comprising an agent for detection of the polynucleotide of 
claim 11. 

29. The kit of claim 28 further comprising instruction for use of said agent for detection of 
said polynucleotide. 
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30. A kit for diagnosis of an individual having susceptibility to joint space narrowing and/or 
osteophyte development and/or joint pain comprising an agent for detection of the fragment of a 
polynucleotide of claim 12. 

31. The kit of claim 30 further comprising instructions for use of said agent for detection of 
said fragment. 

32. A kit for diagnosis of an individual having susceptibility to joint space narrowing and/or 
osteophyte development and/or joint pain comprising the probe, primer or antibody of claim 21. 

33. The kit of claim 32 further comprising instructions for use of said probe, primer or 
antibody. 
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